Out of Memory error - Githubissues

themightyoarfish commented 7 years ago

This may be an upstream issue, but with this fork, the error is different, so I'll just post this here as well. Step 4 does not execute (for example 1) on a K80 with 12gb vram.

...
loading matting laplacian...    gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv>   parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
PANIC: unprotected error in call to Lua API (not enough memory)
...

I've reduced the image size in mattinglaplacian.py from 700 to 500 and it seems to run. There should cli args for single file processing and image size. I'll see if I get around to this.

themightyoarfish commented 7 years ago

Result with size 500:

best1_t_1000

I have no idea if this is the result of simply changing the one value in the python script. Maybe the input files also need to be adapted somehow.

ProGamerGov commented 7 years ago

I have the same issue. When trying to use deepmatting_seg.lua, I get the following errors:

loading matting laplacian...    gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv>   parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
PANIC: unprotected error in call to Lua API (not enough memory)

loading matting laplacian...    gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv>   parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
/home/ubuntu/torch/install/bin/luajit: not enough memory

I believe this issue is related to Lua's 2GB memory limit, and not one's GPU.

@themightyoarfish I have seen people solve this issue by installing Torch with Lua 5.2, using: TORCH_LUA_VERSION=LUA52 ./install.sh, as per the Torch7 installation guide here: http://torch.ch/docs/getting-started.html. This supposedly works because Lua 5.2 does not have the 2GB memory limit that the default Lua has. I have also seen people modify there code to take advantage of Torch7's ability to "off-load" some of the memory as Torch7 itself does not have the 2GB limit.

themightyoarfish commented 7 years ago

Wow, this seems extremely silly to me. Maybe someone with Lua knowledge could work around this? Apparently, not all kinds of objects count towards the memory limit.

I'll try to reinstall Torch w/o LuaJIT some time, but I guess I should first figure out how to get any results at all at smaller resolution.

themightyoarfish commented 7 years ago

Well, turns out this is the reward:

$ make clean && make
find . -type f | xargs -n 5 touch
rm -f libcuda_utils.so
/usr/local/cuda-7.5/bin/nvcc -arch sm_35 -O3 -DNDEBUG --compiler-options '-fPIC' -o libcuda_utils.so --shared cuda_utils.cu -I/home/ubuntu/torch/install/include/THC -I/home/ubuntu/torch/install/include/TH -I/home/ubuntu/torch/install/include -L/home/ubuntu/torch/install/lib -Xlinker -rpath,/home/ubuntu/torch/install/lib -lluaT -lTHC -lTH -lpng
cuda_utils.cu(510): error: identifier "luaL_openlib" is undefined

1 error detected in the compilation of "/tmp/tmpxft_00006ef4_00000000-9_cuda_utils.cpp1.ii".
make: *** [libcuda_utils.so] Error 2

Brief googling says it's some deprecation issue with luaL_openlib.

martinbenson commented 7 years ago

Hi guys. A problem with the original repo was that some flags were missing that help reduce memory footprint a fair bit. I didn't add them back when I forked, but have added them now. Another potential issue is that it may depend somewhat on how your environment is built. I'll post a dockerfile shortly that has worked for me (I made a small change and just want to check that it builds OK before committing). I can run 500 width images on my 980tis (6GB) using this.

martinbenson commented 7 years ago

@themightyoarfish If you drop to 500 in the matting laplacian script you also need to downscale all of the images too - so I suspect that is what generated the odd results.

themightyoarfish commented 7 years ago

I thought the memory issue is some lua-specific thing and not related to GPU memory. But I'm happy to be proven wrong.

martinbenson commented 7 years ago

That might be an issue too, but certainly I've got results out at 500 width without running into it.

themightyoarfish commented 7 years ago

Yes, at 500 it worked for me too, but I had those flags enabled, so I don't actually know if my reducing the image size made it work.

themightyoarfish commented 7 years ago

@ProGamerGov So the problem with Lua 5.2 seem to be that Torch doesn't really work well with it, several people have had the issue that luarocks install cutorch fails complaining about absent cwrap module, although that one installs successfully. I haven't found anything definitive, only this comment from 2 years ago stating that at that point, LuaJIT was the only way to go.

martinbenson commented 7 years ago

@themightyoarfish Even with the cudnn flags I can't run at 700 with 6GB GPU.

ProGamerGov commented 7 years ago

The Lua memory errors always happen after this line is printed:

<csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv

Which leads me to believe that the problem is that parsing the CSV takes too much memory for Lua, to handle properly.

Also, I have the same half image issue as @themightyoarfish .

Multiboxer commented 7 years ago

It looks like the issue is with the file size of the .csv file, LuaJIT gives the error /root/torch/install/bin/luajit: not enough memory After a few seconds of processing this line of code: local CSR = torch.Tensor(csvigo.load({path=CSR_fn,header='false',mode="raw"})):cuda() Which indicates to me that LuaJIT is the culprit of the memory issue. I have tried a Lua5.2 installation of torch but can't seem to get it to work with the script. Changing the tensor to store within GPU memory still results in the 'not enough memory' error which might mean the process of loading the file through csvigo causes the memory error. Does anybody more experienced than me know how to either increase the LuaJIT memory limit, or run csvigo directly into GPU memory, or run Lua5.1 for csvigo and LuaJIT for everything else.

martinbenson commented 7 years ago

There is a "large" option in csvigo - designed for reading large files - but it didn't seem to work for me. May be worth a look if someone has chance.

martinbenson commented 7 years ago

@ProGamerGov Did you downsize all the images, targets, segmentation as well as as dropping the parameter to 500. The sizes and parameter all have to agree to the pixel, which is pretty annoying. It'd be better to have one overall size parameter and add a resizing wrapper in gen_py.all.

Multiboxer commented 7 years ago

Using the "large" option in csvigo fixed the memory issue however there is now a new issue with CSR:size(1) & CSR:size(2) being 'out of range'.

loading matting laplacian... gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv <csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv <csv> parsing done Exp serial: examples/final_results Setting up style layer 2 : relu1_1 Setting up style layer 7 : relu2_1 Setting up style layer 12 : relu3_1 Setting up style layer 21 : relu4_1 Setting up content layer 23 : relu4_2 Setting up style layer 30 : relu5_1 /root/torch/install/bin/luajit: deepmatting_seg.lua:300: bad argument #1 to 'size' (out of range) stack traceback: `[C]: in function 'size' `deepmatting_seg.lua:300: in function 'MattingLaplacian' `deepmatting_seg.lua:269: in function 'opfunc' `/root/torch/install/share/lua/5.1/optim/lbfgs.lua:66: in function 'lbfgs' `deepmatting_seg.lua:296: in function 'main' `deepmatting_seg.lua:623: in main chunk `[C]: in function 'dofile' `/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk `[C]: at 0x00405d50`

Edit: The value of CSR at this point is "[torch.CudaTensor with no dimension]" which means csvigo "large" makes the file load differently? gonna look into this.

martinbenson commented 7 years ago

@Multiboxer Yeah, it seemed to me that the result was just empty when i tried. Good for memory, bad for getting stuff done!!

Multiboxer commented 7 years ago

Alright fixed the memory issue by switching over to torch to load the CSV file, so csvigo is no longer needed in deepmatting_seg.lua. Just gonna clean up the code a bit and post.

Multiboxer commented 7 years ago

Seems the first few lines of the CSV being read are a little off, resulting in corrupted looking results like this: best1_t_1000

Here is what I have so far anyway deepmatting_seg.lua.zip Gonna have to fix this.

ProGamerGov commented 7 years ago

@Multiboxer Does the CSV file have headers?

[Line number 130]https://gist.github.com/ProGamerGov/1524e9710a576586042114fefa06b229#file-deepmatting_seg-lua-L130), has a line of code omitting the first row:

local ROWS = i - 1

Source

Also, testing out your modified deepmatting_seg.lua results in this for me:

But, there is no more Lua related memory errors now.

Multiboxer commented 7 years ago

Yeah, the results vary quite a lot even if its just one or two incorrect lines in the CSV variable. I'm gonna try and output the variables of the csvigo and torch approach in a text format and compare the files to see if I can find out what is going wrong.

Multiboxer commented 7 years ago

Yeah the header was the issue, it was deleting a non existent header. Here is the (hopefully) fully working version of the script: deepmatting_seg.lua.zip

martinbenson commented 7 years ago

Thanks @Multiboxer. I incoporated your code - seems to be working well!

martinbenson / deep-photo-styletransfer

Out of Memory error #3