Open ProGamerGov opened 7 years ago
Testing Cuda 8.0 (cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
, CUDA Toolkit 8.0 GA2 (Feb 2017)) with th neural_style.lua -gpu 0 -backend cudnn
, and cudnn-8.0-linux-x64-v5.0-ga.tgz
, seems to use more memory as well. This is interesting compared to the CUDA Toolkit 8.0 GA1 (Sept 2016) version, tested above.
ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 20:50:25 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 50C P0 149W / 149W | 1726MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 29512 C /home/ubuntu/torch/install/bin/luajit 1715MiB |
+-----------------------------------------------------------------------------+
This makes me wonder if CUDA Toolkit 8.0 GA2 (Feb 2017) added something new that Neural-Style doesn't require, and thus we can disable/remove it in order to lower memory use.
Another setup for comparison:
Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-79-generic x86_64)
cuda-repo-ubuntu1404_7.5-18_amd64.deb
, cudnn-7.0-linux-x64-v4.0-prod.tgz
Cuda 7.5, and cuDNN v4
Setup memory usage:
ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 23:34:49 2017
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 |
| N/A 46C P0 70W / 149W | 154MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1564 C /home/ubuntu/torch/install/bin/luajit 97MiB |
+-----------------------------------------------------------------------------+
Running th neural_style.lua -gpu 0 -backend cudnn
:
ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 23:37:28 2017
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 |
| N/A 63C P0 142W / 149W | 1375MiB / 11519MiB | 93% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1564 C /home/ubuntu/torch/install/bin/luajit 1319MiB |
+-----------------------------------------------------------------------------+
It looks like the memory usage significantly increased in the Cuda 8.0 (Feb 2017) update, while previous versions of Cuda were around 23.5% more efficient, unless the issue scales with the image size value. If the issue scales with the image size value, that would significantly lower the largest possible image size.
An interesting side effect of changing the Cuda, Cudnn, Torch7 versions, and maybe even the Ubuntu versions), is that the seed value effects seems to change.
So if you use -seed 876
with Cuda 8.0, etc..., then that same seed value will not create the same output with Cuda 9.0, etc...
Another way to slightly lower memory usage, seems to be possible by stripping layers from a VGG model:
https://github.com/jcjohnson/neural-style/issues/428#issuecomment-370185610
have you tried instance normalization with your setups? I encounter problems here ....
@flaushi Are you refering to instance normalization from fast-neural-style?
Yes!
It looks like the memory usage significantly increased in the Cuda 8.0 (Feb 2017) update, while previous versions of Cuda were around 23.5% more efficient, unless the issue scales with the image size value. If the issue scales with the image size value, that would significantly lower the largest possible image size.
I'm confused @ProGamerGov , the GPU memory usage definitely scales with image output size -- were these CUDA benchmarks taken with the same image output size? One of my ec2 snapshots has a corrupted cuda install, somehow, so I'm deciding which cuda to go with -- would rather use the most memory-efficient.
@ajhool I believe that I was using th neural_style.lua -gpu 0 -backend cudnn
for all the tests, according to my first post in this thread. I currently have 3-4 different EC2 AMIs with CUDA installed, so I can check again if you like. But I do remember finding that parameter sets which worked with earlier version of CUDA, running out of memory on the latest versions.
That's really interesting. I sensed a significant speed increase on the newer versions of Cuda and cudnn, but "sensed" is the operative word because I didn't do a controlled benchmark. I also sensed the memory usage increase. The latest Cuda supports the Volta GPU, too, so I'm excited to see if there are significant rendering gains to be made, there.
What I'm actually finding (I think) is that for smaller renders the main bottleneck occurs in some process that luajit is executing on the CPU. So, when I run multiple renders on the same GPU (~60% GPU memory usage), I am seeing significant (2x-3x) rendering slowdowns, and I am also seeing all 4 of the CPU cores get locked up at 100% due to a luajit process. I believe that I'd posted an issue about this in the past but might be misremembering. But concurrent rendering seems to be constrained by the CPU before the GPU, frustratingly.
After using
th neural_style.lua -gpu 0 -backend cudnn
to compare an older install of Neural-Style, to a newer one, I noticed that the performance seems to have gotten worse. Is this something that's fixable? Or are the new versions of Cuda and cuDNN just not as efficient as the previous versions?cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
,cudnn-8.0-linux-x64-v5.0-ga.tgz
Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-38-generic x86_64)
Compared to:
Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1038-aws x86_64)
cudnn-9.0-linux-x64-v7.tgz
,libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
Before the network loads the model, more memory is also used:
This is a pretty large change in terms of resource usage, and it's definitely not for the better.