jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

is the opencl backend really less memory efficient than the cuda one? #225

Open rayset opened 8 years ago

rayset commented 8 years ago

I've got a 280x with 4 gbs of vram. I'm forcing myself to wait for the 16 gb cards that may be annunced in the gtx series, but an option could be getting a 4-6 gb cuda card from ebay and ''wait more happly''.

Would I get relevant (say, 10%+) improvements in the output size with a 4gb cuda card like a 970?

sdziscool commented 8 years ago

6GB cards would probably give the improvement you're looking for. I don't know how much the "3.5 + 0.5 GB" VRAM on the 970 would affect your performance. The R9 390 boasts 8GB VRAM for about the same price which can give you huge improvements, however it doesn't have CUDA support of course.

jcjohnson commented 8 years ago

Yes, the OpenCL backend is a lot more inefficient than the CUDA backend. The CUDA backend for Torch has been optimized by many people, and relies on cuDNN for convolutions which is a highly optimized library directly from NVIDIA. The OpenCL backend for Torch, on the other hand, is mostly a heroic one-man project from @hughperkins:

https://github.com/hughperkins/cltorch https://github.com/hughperkins/clnn

In particular, I believe that the OpenCL backend uses an im2col approach to compute convolutions which is fairly memory-hungry, while cuDNN uses much more efficient algorithms for convolutions.