Closed themightyoarfish closed 7 years ago
Result with size 500:
I have no idea if this is the result of simply changing the one value in the python script. Maybe the input files also need to be adapted somehow.
I have the same issue. When trying to use deepmatting_seg.lua
, I get the following errors:
loading matting laplacian... gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
PANIC: unprotected error in call to Lua API (not enough memory)
loading matting laplacian... gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
/home/ubuntu/torch/install/bin/luajit: not enough memory
I believe this issue is related to Lua's 2GB memory limit, and not one's GPU.
@themightyoarfish I have seen people solve this issue by installing Torch with Lua 5.2, using: TORCH_LUA_VERSION=LUA52 ./install.sh
, as per the Torch7 installation guide here: http://torch.ch/docs/getting-started.html. This supposedly works because Lua 5.2 does not have the 2GB memory limit that the default Lua has. I have also seen people modify there code to take advantage of Torch7's ability to "off-load" some of the memory as Torch7 itself does not have the 2GB limit.
Wow, this seems extremely silly to me. Maybe someone with Lua knowledge could work around this? Apparently, not all kinds of objects count towards the memory limit.
I'll try to reinstall Torch w/o LuaJIT some time, but I guess I should first figure out how to get any results at all at smaller resolution.
Well, turns out this is the reward:
$ make clean && make
find . -type f | xargs -n 5 touch
rm -f libcuda_utils.so
/usr/local/cuda-7.5/bin/nvcc -arch sm_35 -O3 -DNDEBUG --compiler-options '-fPIC' -o libcuda_utils.so --shared cuda_utils.cu -I/home/ubuntu/torch/install/include/THC -I/home/ubuntu/torch/install/include/TH -I/home/ubuntu/torch/install/include -L/home/ubuntu/torch/install/lib -Xlinker -rpath,/home/ubuntu/torch/install/lib -lluaT -lTHC -lTH -lpng
cuda_utils.cu(510): error: identifier "luaL_openlib" is undefined
1 error detected in the compilation of "/tmp/tmpxft_00006ef4_00000000-9_cuda_utils.cpp1.ii".
make: *** [libcuda_utils.so] Error 2
Brief googling says it's some deprecation issue with luaL_openlib
.
Hi guys. A problem with the original repo was that some flags were missing that help reduce memory footprint a fair bit. I didn't add them back when I forked, but have added them now. Another potential issue is that it may depend somewhat on how your environment is built. I'll post a dockerfile shortly that has worked for me (I made a small change and just want to check that it builds OK before committing). I can run 500 width images on my 980tis (6GB) using this.
@themightyoarfish If you drop to 500 in the matting laplacian script you also need to downscale all of the images too - so I suspect that is what generated the odd results.
I thought the memory issue is some lua-specific thing and not related to GPU memory. But I'm happy to be proven wrong.
That might be an issue too, but certainly I've got results out at 500 width without running into it.
Yes, at 500 it worked for me too, but I had those flags enabled, so I don't actually know if my reducing the image size made it work.
@ProGamerGov So the problem with Lua 5.2 seem to be that Torch doesn't really work well with it, several people have had the issue that luarocks install cutorch
fails complaining about absent cwrap
module, although that one installs successfully. I haven't found anything definitive, only this comment from 2 years ago stating that at that point, LuaJIT was the only way to go.
@themightyoarfish Even with the cudnn flags I can't run at 700 with 6GB GPU.
The Lua memory errors always happen after this line is printed:
<csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
Which leads me to believe that the problem is that parsing the CSV takes too much memory for Lua, to handle properly.
Also, I have the same half image issue as @themightyoarfish .
It looks like the issue is with the file size of the .csv file, LuaJIT gives the error /root/torch/install/bin/luajit: not enough memory
After a few seconds of processing this line of code:
local CSR = torch.Tensor(csvigo.load({path=CSR_fn,header='false',mode="raw"})):cuda()
Which indicates to me that LuaJIT is the culprit of the memory issue. I have tried a Lua5.2 installation of torch but can't seem to get it to work with the script. Changing the tensor to store within GPU memory still results in the 'not enough memory' error which might mean the process of loading the file through csvigo causes the memory error. Does anybody more experienced than me know how to either increase the LuaJIT memory limit, or run csvigo directly into GPU memory, or run Lua5.1 for csvigo and LuaJIT for everything else.
There is a "large" option in csvigo - designed for reading large files - but it didn't seem to work for me. May be worth a look if someone has chance.
@ProGamerGov Did you downsize all the images, targets, segmentation as well as as dropping the parameter to 500. The sizes and parameter all have to agree to the pixel, which is pretty annoying. It'd be better to have one overall size parameter and add a resizing wrapper in gen_py.all.
Using the "large" option in csvigo fixed the memory issue however there is now a new issue with CSR:size(1)
& CSR:size(2)
being 'out of range'.
loading matting laplacian... gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv> parsing file: gen_laplacian/Input_Laplacian_3x3_1e-7_CSR1.csv
<csv> parsing done
Exp serial: examples/final_results
Setting up style layer 2 : relu1_1
Setting up style layer 7 : relu2_1
Setting up style layer 12 : relu3_1
Setting up style layer 21 : relu4_1
Setting up content layer 23 : relu4_2
Setting up style layer 30 : relu5_1
/root/torch/install/bin/luajit: deepmatting_seg.lua:300: bad argument #1 to 'size' (out of
range) stack traceback:
`
[C]: in function 'size'
`deepmatting_seg.lua:300: in function 'MattingLaplacian'
`
deepmatting_seg.lua:269: in function 'opfunc'
`/root/torch/install/share/lua/5.1/optim/lbfgs.lua:66: in function 'lbfgs'
`
deepmatting_seg.lua:296: in function 'main'
`deepmatting_seg.lua:623: in main chunk
`
[C]: in function 'dofile'
`/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
`
[C]: at 0x00405d50`
Edit: The value of CSR at this point is "[torch.CudaTensor with no dimension]
" which means csvigo "large" makes the file load differently? gonna look into this.
@Multiboxer Yeah, it seemed to me that the result was just empty when i tried. Good for memory, bad for getting stuff done!!
Alright fixed the memory issue by switching over to torch to load the CSV file, so csvigo is no longer needed in deepmatting_seg.lua. Just gonna clean up the code a bit and post.
Seems the first few lines of the CSV being read are a little off, resulting in corrupted looking results like this:
Here is what I have so far anyway deepmatting_seg.lua.zip Gonna have to fix this.
@Multiboxer Does the CSV file have headers?
[Line number 130]https://gist.github.com/ProGamerGov/1524e9710a576586042114fefa06b229#file-deepmatting_seg-lua-L130), has a line of code omitting the first row:
local ROWS = i - 1
Also, testing out your modified deepmatting_seg.lua
results in this for me:
But, there is no more Lua related memory errors now.
Yeah, the results vary quite a lot even if its just one or two incorrect lines in the CSV variable. I'm gonna try and output the variables of the csvigo and torch approach in a text format and compare the files to see if I can find out what is going wrong.
Yeah the header was the issue, it was deleting a non existent header. Here is the (hopefully) fully working version of the script: deepmatting_seg.lua.zip
Thanks @Multiboxer. I incoporated your code - seems to be working well!
This may be an upstream issue, but with this fork, the error is different, so I'll just post this here as well. Step 4 does not execute (for example 1) on a K80 with 12gb vram.
I've reduced the image size in
mattinglaplacian.py
from 700 to 500 and it seems to run. There should cli args for single file processing and image size. I'll see if I get around to this.