uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

issue passing bAbI tests with memNN implementation #180

Closed JohannesMaxWel closed 7 years ago

JohannesMaxWel commented 7 years ago

I implemented the end-to-end memory network model from https://arxiv.org/pdf/1503.08895.pdf (c.f. 588cd1d8a8148495f519d17d6404814bc0f69da9) and am currently somewhat stuck because it won't generalise on some of the tasks, notably task 2. Some easier tasks (e.g. task 1) are passing.

On task 2, validation set accuracy doesn't climb higher than ca. 20-30%, even when the training accuracy climbs beyond 99%. To put this into perspective: this is almost as bad as random given that the task has 6 observed answer types.

At first I thought the model's just overfitting pretty badly. But also with a quite small embedding size of 10 (where the model can't even fully fit the training data any more, 95% acc.), it won't go above 30% on the validation set.

Then I thought padding might be an issue, since the validation setting has examples with more sentences (support documents) observed than the max on the training set. I adjusted for that, and when printing out intermediate representations, the multiple support documents (sentences in the bAbI dataset) are all fully visible to the model.

Then I thought it may be an issue with the vocabulary (there's currently 3 different vocabularies for candidates/targets and support docs output by the preprocessing pipeline), but the intermediate word-id representations obtained from the batcher look fine and I can't spot a difference between train/validation setting. But I may miss something here that I'm not aware of -- is there something about the vocabularies that is used differently in train/test settings?

And -- more generally -- does one of you have any other suggestion for what could be going wrong here?

Thanks for any help!

J

pminervini commented 7 years ago

Can you provide a command line for reproducing the problem?

JohannesMaxWel commented 7 years ago

This is mostly resolved after fixing the problem in the tokeniser.

Still not all bAbI tasks pass (task 2 and 3 seem notoriously hard), but I found that changing the seed a couple of times is really important. There will be settings that generalise well on the validation set and others not at all, i.e. the memNN model is highly sensitive to initialisation.

On commit 998f19de1e3d020beaebd49e4c0f65bf00aa5a1a when running

python3 jtr/jack/train/train_reader.py --model memNN_reader --train data/bAbI/jtr_format/task4/train.jtr.json --dev data/bAbI/jtr_format/task4/valid.jtr.json --test data/bAbI/jtr_format/task4/test.jtr.json --repr_dim_input 20 --learning_rate=0.01 --epochs 200 --batch_size 64 --dev_batch_size 100 --learning_rate_decay 0.95  --clip_value=40 --seed=42

the model learns well.