jcjohnson / densecap

Dense image captioning in Torch
MIT License
1.58k stars 432 forks source link

Tapping encoded image vector (which is passed to the RNN) #58

Open vishalathreya opened 7 years ago

vishalathreya commented 7 years ago

Hi, In LanguageModel.lua, the function LM:sample(image_vectors) takes the (Bx4096) features of best B region proposals, encodes them into Bx512 image_vecs_encoded before passing it to the RNN.

After the RNN generates the captions, the above function returns the self.output = seq caption sequences. I want to also return image_vecs_encoded (I want to use this image representation for my purpose). I'm not able to find out where the above function returns to.

When i tried printing the debug.traceback(), I got the following -:

stack traceback: ./densecap/LanguageModel.lua:110: in function 'func' /home/babu/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /home/babu/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function </home/babu/torch/install/share/lua/5.1/nngraph/gmodule.lua:300> [C]: in function 'xpcall' /home/babu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/babu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' ./densecap/DenseCapModel.lua:253: in function 'forward' ./densecap/DenseCapModel.lua:320: in function 'forward_test' run_model.lua:77: in function 'run_image' run_model.lua:164: in main chunk [C]: in function 'dofile' ...babu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406620

After the B best region proposals' captions are obtained, I think a final NMS is again performed to reduce the number of Dense captions further (which is what is written to the JSON file?). This NMS is not performed on the image_vecs_encoded (since it is not returned). Basically I want to do that also and finally use the encoded features of only the dense caption regions that are written to the JSON file. How do I do that?

Thanks a lot! :)