Closed MoritzLost closed 2 years ago
You don't have enough vram sadly. Also depending on the amount of detail in each image it may inflate vram needs. 900 pixels on a 970 is REALLY good in my experience.
You are right, 2 GB VRAM isn't that much for this application. However, that doesn't explain why after the updates, executing a query that worked before now fails, even though I'm using the exact same parameters and images as before ...
I've got the same problem, but it works if I remove the autotune flag. Similar environment -- W541 laptop with only 2GB VGRAM, Ubuntu 15.10, CUDA 7.5 + CUDNN v4 (v5 doesn't work). Hope that helps.
so, i'm new to ubuntu and linux in general. i have been running some tests with this software for 2 weeks and i have stumbled across a similar problem.
user@user-B85M-D3H:~/neural-style$ time th neural_style.lua -content_image /home/user/Documents/srcx.png -style_image /home/bart2/Documents/style6.jpg -image_size 620 -gpu 0 -num_iterations 2000 -optimizer adam -backend cudnn -cudnn_autotune -seed 666 -save_iter 0 -output_image out8.png
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
/home/user/torch/install/bin/luajit: /home/user/torch/install/share/lua/5.1/nn/Container.lua:67:
In 35 module of nn.Sequential:
/home/user/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_ALLOC_FAILED (cudnnFindConvolutionForwardAlgorithm)
stack traceback:
[C]: in function 'error'
/home/user/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:185: in function 'createIODescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:360: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
[C]: in function 'xpcall'
/home/user/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/user/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
neural_style.lua:204: in function 'main'
neural_style.lua:500: in main chunk
[C]: in function 'dofile'
...art2/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/user/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/user/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
neural_style.lua:204: in function 'main'
neural_style.lua:500: in main chunk
[C]: in function 'dofile'
...art2/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
i temporarly fixed this by running:
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:/home/bart2/torch/install/lib:
source ~/.bashrc
then torch would work until i booted firefox. i don't know if this was the correct solution but it worked for me.
Just now got into a similar error. My kernel was dieing before which this error code pops in the cmd. I solved it by closing all other jupyter notebooks (as one was already open) and then it worked. Try closing all other notebooks which are using gpu (simply, in which, you have imported tensorflow).
I am receiving a similar error training a pix2pix model. hardware: 1080ti in slot 1 2080ti in slot 2 msi tomahawk ac x299 mobo (both pcie x16 slots)
I only get the allocation error when trying to use the 2080 ti in the second pcie slot on my motherboard. I need it there for thermal reasons (the 1080 ti overheats with the 2080ti above it). I have tried with just the 2080ti installed (this succeeded) as well as using cuda_visible_devices to select only the 2080ti when both were installed (this caused the error). Is there some hardware limitation with allocating to the second pcie device?
So I just returned from the holidays and updated both the neural-style repo as well as it's dependencies. Now I get some error messages that weren't there before ...
Before the holidays, I could process images with up to
-image_size 900
. Now, if I run the following query:with the new
-cudnn_autotune
flag, I get this error message (cutting off the beginning for readability):If I run the same query without the
-cudnn_autotune
flag, I get an out of memory-error. I've tried reducing the output image size; at-image_size 700
, it works as expected both with and without thecudnn_autotune
flag. Is there anything I can do to make this work (again) for larger output images? For example, does the size of the content-image and style-image matter? Or is there anything else that might have caused my machine not to be able to process images with-image_size 900
anymore?I have a GTX 970, I'm using the CUDNN-backend with CUDA 7.5, cutorch and cunn installed as well as CuDNN 3.0.
Thanks!
Edit: I've tried a query from my bash history that worked perfectly fine before the holidays and the updates. Now it stops after
Setting up style layer 12 : relu3_1
with an out of memory-error. So it's not just an issue of different stlye- and content-images and/or settings ...