Open rayset opened 8 years ago
6GB cards would probably give the improvement you're looking for. I don't know how much the "3.5 + 0.5 GB" VRAM on the 970 would affect your performance. The R9 390 boasts 8GB VRAM for about the same price which can give you huge improvements, however it doesn't have CUDA support of course.
Yes, the OpenCL backend is a lot more inefficient than the CUDA backend. The CUDA backend for Torch has been optimized by many people, and relies on cuDNN for convolutions which is a highly optimized library directly from NVIDIA. The OpenCL backend for Torch, on the other hand, is mostly a heroic one-man project from @hughperkins:
https://github.com/hughperkins/cltorch https://github.com/hughperkins/clnn
In particular, I believe that the OpenCL backend uses an im2col approach to compute convolutions which is fairly memory-hungry, while cuDNN uses much more efficient algorithms for convolutions.
I've got a 280x with 4 gbs of vram. I'm forcing myself to wait for the 16 gb cards that may be annunced in the gtx series, but an option could be getting a 4-6 gb cuda card from ebay and ''wait more happly''.
Would I get relevant (say, 10%+) improvements in the output size with a 4gb cuda card like a 970?