Model tuning duplicates generated in decoder

harvardnlp / im2markup

Neural model for converting Image-to-Markup (by Yuntian Deng yuntiandeng.com)

https://im2markup.yuntiandeng.com

MIT License

1.21k stars 213 forks source link

Model tuning duplicates generated in decoder #8

Open mingchen62 opened 6 years ago

mingchen62 commented 6 years ago

After training, I started to evaluate and I found the prediction interesting. The trained model did good prediction on some more complicate Latex such as fraction or sqrt, it failed on some simpler formula. For example, ground truth is "y=x^+2x +1" but the prediction is "y=x^2+2x +2x + 1". ground truth is "270" but the prediction is "2700". The decoder duplicates last symbol(s). Any hint on how to tune the model to alleviate the issue?

My training results looks reasonable: Epoch: 11 Step 43142 - Val Accuracy = 0.923066 Perp = 1.137150 Epoch: 12 Step 47064 - Val Accuracy = nan Perp = 1.138024

da03 commented 6 years ago

Hmm which dataset are you using? I haven't observed that repetition problem in im2text before, but repetition has been a well known problem in other seq2seq problems like summarization, and people usually solve that with coverage penalty to avoid attending to the same source word too much.

mingchen62 commented 6 years ago

thanks. I am using a handwriting formula data set. I guess, the variety of distance between handwriting symbols contributes to the repetition problem. I also look at opennmt-py for coverage penalty.
i.e. https://github.com/OpenNMT/OpenNMT-py/issues/340. Will report if I have any luck in that try.

mingchen62 commented 6 years ago

Tried a few combination of length and coverage parameters. some worse and some minor improvement. May need more hyper parameter exploration. For example, https://arxiv.org/pdf/1703.03906.pdf

zhangw-memo commented 6 years ago

after training,how about your bleu value? I didn't do anything,accuracy increased by 3%.