torralba-lab / im2recipe

Code supporting the CVPR 2017 paper "Learning Cross-modal Embeddings for Cooking Recipes and Food Images"
MIT License
374 stars 89 forks source link

Training the skip-instructions model #22

Closed mrdotnic closed 5 years ago

mrdotnic commented 5 years ago

Hi there,

first off, I want to say I find your paper and code very interesting and the amount of data you collected amazing.

Currently I'm trying to encode new recipes to skip-instructions in order to input them to the im2recipe-model later in order to get the "trijoint"-embeddings. To be able to encode them to skip-instructions, I'm trying to train the skip-instructions-model so I can then use the trained encoder to extract features from new recipe texts.

While running "moon main.moon -[...]" an error similar to #9 occurred:

Error Message before changing LookupTableW2V.moon similar to https://github.com/torralba-lab/im2recipe/issues/9#issue-260132291:

/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-633/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [60,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-633/cutorch/init.c line=246 error=59 : device-side assert triggered
moon: /usr/local/share/lua/5.1/threads/threads.lua:179: [thread 1 endcallback] /usr/local/share/lua/5.1/nn/Container.lua:67: 
In 3 module of nn.Sequential:
/usr/local/share/lua/5.1/cudnn/RNN.lua:627: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-633/cutorch/init.c:246
stack traceback:
        [C]: in function 'synchronize'
        /usr/local/share/lua/5.1/cudnn/RNN.lua:627: in function </usr/local/share/lua/5.1/cudnn/RNN.lua:449>
        [C]: in function 'xpcall'
        /usr/local/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
        /usr/local/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        /th-skip/th-skip/model/Decoder.moon:41: in function 'forward'
        /th-skip/th-skip/model/ST.moon:32: in function 'forward'
        /th-skip/th-skip/drivers/train.moon:39: in function </th-skip/th-skip/drivers/train.moon:35>
        [C]: in function 'xpcall'
        /usr/local/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
        /usr/local/share/lua/5.1/threads/threads.lua:223: in function 'train'
        main.moon:67: in function 'moonscript_chunk'
        /usr/local/lib/luarocks/rocks/moonscript/0.5.0-1/bin/moon:86: in function </usr/local/lib/luarocks/rocks/moonscript/0.5.0-1/bin/moon:84>
        [C]: in function 'xpcall'
        /usr/local/lib/luarocks/rocks/moonscript/0.5.0-1/bin/moon:99: in function </usr/local/lib/luarocks/rocks/moonscript/0.5.0-1/bin/moon:47>
        [C]: at 0x00405d50

I then replaced in LookupTableW2V.moon as described in https://github.com/torralba-lab/im2recipe/issues/9#issuecomment-331774874:

-LookupTableW2V, parent = torch.class('nn.LookupTableW2V', 'nn.LookupTable')
+LookupTableW2V, parent = torch.class('nn.LookupTableW2V', 'nn.LookupTableMaskZero')

and

-  @weight\sub(nRandInit+1, -1)\copy(w2v\sub(1, nWordsW2V))
+  @weight\sub(nRandInit+2, -1)\copy(w2v\sub(1, nWordsW2V))

Now the following error occurs:

moon: /usr/local/share/lua/5.1/torch/init.lua:102: bad argument #2 (invalid parent class name nn.LookupTableMaskZero)
stack traceback:
        [C]: in function 'newmetatable'
        /usr/local/share/lua/5.1/torch/init.lua:102: in function 'class'
        /th-skip/th-skip/model/LookupTableW2V.moon:6: (6) in function 'dofile'
        ./model/init.moon:8: (8) in main chunk
        [C]: in function 'require'
        main.moon:8: (7) in main chunk

The torch "nn" library does indeed not contain "LookupTableMaskZero.lua" if I understand that error message correctly, but all the dependencies mentioned in the README.md like the torch "rnn" library containing "LookupTableMaskZero.lua" are installed and up to date, so I don't understand how to solve this issue.

Any help would be much appreciated. Best regards, mrdotnic

nhynes commented 5 years ago

Hey @mrdotnic, thanks for creating this issue! Torch7 is very deprecated and I recommend switching to PyTorch. We have a PyTorch implementation of the recipe embedding model and there are PyTorch implementations of Skip-Thoughts vectors. Please let me know if these don't help!

mrdotnic commented 5 years ago

Thank you @nhynes ! I will try it and if need be create a new issue in your PyTorch implementation project.