nengo / keras-lmu

Keras implementation of Legendre Memory Units
https://www.nengo.ai/keras-lmu/
Other
209 stars 35 forks source link

Perform LMUFFT with raw convolution #42

Closed hunse closed 3 years ago

hunse commented 3 years ago

Add the ability to run the impulse response convolution as a raw convolution, rather than using the FFT. In practice, I've found that this can speed things up, though it also appears to require more CPU memory (which is surprising).

I also added a profiling test.

Based on #40.

TODO:

hunse commented 3 years ago

When testing this on my RTX 3060 for my specific model, I'm finding that the FFT implementation is faster than the raw conv implementation. So which one is best does seem to depend on the specific hardware/CUDA/TensorFlow. I'm hoping to test across more hardware soon, but I think for the foreseeable future, we're looking at keeping both implementations around. The best would be to autotune it, but that's probably a good chunk more work.

hunse commented 3 years ago

I think this is ready to go. In the end, I had to add two ways of doing the raw convolution, since the one that's faster on GPUs (using NCHW format) doesn't work on CPU.

hunse commented 3 years ago

Fixups look good to me. When you've got all the tests passing, feel free to merge.

tbekolay commented 3 years ago

New commits lgtm :+1: