jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Running on Amazon p3 instances (V100 GPU) #437

Open albarji opened 6 years ago

albarji commented 6 years ago

Hi there,

Has anyone tried running neural-style in one of the Amazon's fancy p3 instances with Volta 100 GPUs? I usually run this on a p2 instance (K80 GPU) without issue, and I was expecting a significant speed performance improvement when going to the V100 card. However the results I am obtaining are far from this:

Time to process a 400x400 image on a p2 (K80) instance: 1m41s Time to process a 400x400 image on a p3 (P100) instance: 9m25s

Something is going terribly wrong there. I'm running everything inside the nvidia/cuda:8.0-cudnn5-devel docker container. I'm aware CUDA 9 is recommended for Volta cards, but Torch seems to have build issues with such version.

Any thoughts on this would be appreciated.

ProGamerGov commented 6 years ago

Can you create larger images with the Volta 100 GPUs? I've also noticed that newer Cuda/Torch/cuDNN versions seem to have worse performance, so maybe something like that is messing with your results?

You could also running: export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" before running Torch7's ./install.sh, to improve performance as per this issue.