Inconsistency wrt paper in code

ronghanghu / snmn

Code release for Hu et al., Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018

BSD 2-Clause "Simplified" License

72 stars 6 forks source link

I noticed two inconsistencies in your code viz. the equations in paper (for VQA training without layout):

In paper (sec 3.1 end), cv_{t,s} is obtained as a weighted combination of LSTM outputs. But in your code, the config for vqa_scratch says cfg.MODEL.CTRL.USE_WORD_EMBED = True. That means you are doing weighted combination of embeddings instead of LSTM outputs.
In sharpening of stack pointer, the paper (sec 3.3 2nd para.) says use softmax, but the config for vqa_scratch says cfg.MODEL.NMN.STACK.USE_HARD_SHARPEN = True

I tried to use cfg.MODEL.CTRL.USE_WORD_EMBED = True in my PyTorch impl. but the accuracy significantly drops. Please explain.

ronghanghu / snmn