Currently the conv2d set uses time as the x-axis (naturally), and features as the y-axis (not-so-naturally). Then we have window/stride sort of finding patterns positionally. Makes perfect sense to have a window finding positional patterns on the time axis, but really the features should all boil down together, not in chunks. So TODO: experiment with features as channels/depth instead of height.
Currently the conv2d set uses time as the x-axis (naturally), and features as the y-axis (not-so-naturally). Then we have window/stride sort of finding patterns positionally. Makes perfect sense to have a window finding positional patterns on the time axis, but really the features should all boil down together, not in chunks. So TODO: experiment with features as channels/depth instead of height.