nagadomi / distro

Unofficial maintenance repository of Torch7. It supports CUDA10.1, Volta, Turing, Docker https://hub.docker.com/r/nagadomi/torch7
BSD 3-Clause "New" or "Revised" License
201 stars 57 forks source link

Error: 'attempt to index field 'mask' (a nil value)' while evaluating Torch network #4

Closed ellick53 closed 4 years ago

ellick53 commented 4 years ago

I am trying to train this network: http://lear.inrialpes.fr/research/lvo/ using your Torch distro.

I am using Ubuntu 18 + CUDA 10.0. I also installed rnn from https://github.com/Element-Research/rnn and extracunn from https://github.com/viorik/extracunn.git.

Unfortunately, when I start the training I get this error:

user01@ubuntu:~/tokmakov/lvo$ th test_all_davis.lua -gpu 0 -memoryModel paper.dat -motionModel mpnet.dat -setting evaluate_journal
1   
/home/user01/tokmakov/torch/install/bin/luajit: ...01/tokmakovtorch/install/share/lua/5.1/nn/Container.lua:67: 
In 10 module of nn.Sequential:
...01/tokmakovtorch/install/share/lua/5.1/nn/CMaxTable.lua:19: attempt to index field 'mask' (a nil value)
stack traceback:
    ...01/tokmakovtorch/install/share/lua/5.1/nn/CMaxTable.lua:19: in function <...01/tokmakovtorch/install/share/lua/5.1/nn/CMaxTable.lua:12>
    [C]: in function 'xpcall'
    ...01/tokmakovtorch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...1/tokmakov/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./segment_frame_davis.lua:57: in function 'segment'
    ./test_video_davis.lua:18: in function 'testVideo'
    test_all_davis.lua:46: in main chunk
    [C]: in function 'dofile'
    ...akov/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x56166de07570

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    ...01/tokmakovtorch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    ...1/tokmakov/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./segment_frame_davis.lua:57: in function 'segment'
    ./test_video_davis.lua:18: in function 'testVideo'
    test_all_davis.lua:46: in main chunk
    [C]: in function 'dofile'
    ...akov/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x56166de07570

I get a similar error if I try to train the network. I am not familiar with Lua or Torch (I usually use TensorFlow 2).

Could someone help me? I really need to make this work, but the only thing I can think of is some obscure incompatibility with more recent systems, which would require installing Ubuntu 14 + CUDA 7.5.

Thanks!

nagadomi commented 4 years ago

The dataset link is missing. https://davischallenge.org/code.html Which DAVIS dataset are you using?

ellick53 commented 4 years ago

Thanks for replying! I am using Davis 2016: https://davischallenge.org/davis2016/code.html

nagadomi commented 4 years ago

The code requires Optical Flow result from mpnet. However, I do not have MATLAB, so I cannot test it.

The problem is probably the last CMaxTable change. https://github.com/torch/nn/commit/db4244e6ee30b0ec689815a00fcc0c8c45f91b12 This commit breaks old pre-training model that uses CMaxTable (The added variables mask, maxVals, and gradMaxVals are not defined in the loaded model.).

Copying the previous version of CMaxTable.lua to ~/torch/extra/nn/CMaxTable.lua, ~/torch/install/share/lua/5.1/nn/CMaxTable.lua, may fix it (not tested) .

the previous version of CMaxTable.lua: https://raw.githubusercontent.com/torch/nn/c463e548ecf795b84b171121a0206e5e326d4858/CMaxTable.lua

nagadomi commented 4 years ago

I have not yet decided whether to fix existing bugs in torch7.

ellick53 commented 4 years ago

Thanks! It worked! Could this replacement cause any side effects, though?

nagadomi commented 4 years ago

Could this replacement cause any side effects, though?

Performance may be slightly degraded. Probably the pre-trained model with the newer CMaxTable will also work, but the code accessing the added variables(maxVals) will not work.

ellick53 commented 4 years ago

I see. In any case, I should still be able train the network without issues?

nagadomi commented 4 years ago

http://lear.inrialpes.fr/research/lvo/ 's pre-trained models is trained with the previous version of CMaxTable. So I think that train without the newer CMaxTable is correct as replicated experiment.

ellick53 commented 4 years ago

I see, thanks!