jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Autotune Flag Image Size Limitation #406

Open jiggymang opened 7 years ago

jiggymang commented 7 years ago

I've recently upgraded to a Titan X (Pascal), and I'm getting a failure that I can't figure out. It always occurs at an image size just under 1500. Since it seems to be based on scale, I'm assuming it's a memory issue. However, when I monitor my VRAM usage, it's well below the Titan's capacity. I can successfully run up to around 9.5GB of VRAM usage, but once I go past that, I get an error:

cudnnFindConvolutionForwardAlgorithm failed: 4 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA1,3,1500,1212 -filtA64,3,3,3 1,64,1500,1212 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT

When I run on my GTX1080, I can get pretty close to the max VRAM before I get an out of memory error. Since this new error is well below the 12GB that the Titan has, I'm wondering if there is some other resource that may be causing a bottleneck? I'm running an i5-4690k with 32GB DDR3.

I'm a little frustrated that the Titan isn't getting me much higher image size than the GTX 1080. Also, for reference, even using the -multigpu_strategy flag gives me a similar error when both cards are around 50% usage (~10GB total).

Any help would be greatly appreciated!

jiggymang commented 7 years ago

Update: The issue is with the cudnn_autotune flag. As soon as I remove it, I can use all 20GB between both cards, and went from an image size of under 1500 up to 2000. I tried reinstalling cudnn, but applying the autotune flag still has the same error.

ajhool commented 7 years ago

I believe autotune works by taking a lot of memory on the first pass to reduce the memory on subsequent passes. Perhaps the spike in the beginning is causing a memory failure.