Open harold opened 7 years ago
The NN problems I am running tend to run slower on GPU than CPU (they're not image based, but just trying to classify things based on a dozen variables), so there something like this would help a lot (especially when I am using things like BOPP (https://github.com/probprog/bopp) to discover the optimal parameters for the NN (what batch size, l2, learning rate, number of nodes per layer).
From this mailing list post: https://groups.google.com/forum/#!topic/clojure-cortex/YKpWDMsSU5s
Comes this popular summary: https://www.sciencedaily.com/releases/2017/06/170601135633.htm
Of this preprint with the same title as this issue: https://arxiv.org/pdf/1602.08194.pdf
My takeaways:
The practical speedups reported in the paper are on the same order (tens of x) that we currently get from GPU computation. No practical GPU version of these ideas exists (and when it does we'll probably pretty easily be able to leverage it, e.g., if it makes it into cudnn).
These techniques, as reported in the paper, stand to benefit low-power (read: mobile phone) and/or distributed (read: google datacenter) nn systems. This is a different, and so-far non-overlapping, niche of nn than Cortex addresses. When we do move in that direction, and/or when there is a practical GPU implementation of this, we should definitely hop on board.
@rosejn - This is a cute intersection of a lot of your interests; the paper is perhaps a fun read for you.