Closed ipod825 closed 4 years ago
@ipod825 Sweet! I am actually planning on writing the retina in CUDA for a class project to speed things up. I've tried running the code on CPU and on GPU, and CPU code is faster. I think the bottleneck is the retina but like you said, I need to do more profiling!
@ipod825 @kevinzakka Our group tried out both the implementations. On the MNIST data, the code provided by @kevinzakka runs approximately @ 1000 iters/sec. However, switching the code to gpu, the code runs at 250-300 iter/sec. The trick suggested by @ipod825 helps in attaining 1000 iter/s when run on GPU.
https://github.com/kevinzakka/recurrent-visual-attention/blob/b659b6ff06561d073320b8123811ee738f968d9f/modules.py#L10
Just FYI, I am re-factoring your code and found that the retina network can be made a little faster by padding the whole batch with sufficient 0s and then extracting the patches directly. You can check a working version here.
p.s. I didn't do much profile, just check the time for the first epoch several times (about 1.3 times faster).