localminimum / QANet

A Tensorflow implementation of QANet for machine reading comprehension
MIT License
983 stars 310 forks source link

dev set evaluation #21

Closed nehaboob closed 6 years ago

nehaboob commented 6 years ago

My test and dev sets are same. But I get different results from training check point evaluation vs running config.py in test mode.

Ideally it should give same results because we are loading the saved model and running it on dev file again ?

localminimum commented 6 years ago

The dev results obtained from training mode will be different to running test mode because training mode doesn't use exponential moving average at inference. Ideally, the results obtained from test mode should be higher.

nehaboob commented 6 years ago

I was getting worse results in test mode. Then I commented below from test(config) function and now getting same results as training mode.

if config.decay < 1.0: sess.run(model.assign_vars)

localminimum commented 6 years ago

If you train longer and let the exponential moving average variables settle, you will see better results with test mode.

nehaboob commented 6 years ago

Thanks, so typically how many steps ?

Just to try it out, I am training on just 600 questions and dev set is 60 questions and algorithm is running for 1500 steps. Looking at dev loss and training loss I can see its over fitting pretty quickly. I think its too less data for learning anything meaningful.

Let me know if you have an idea on minimum number of training questions and corresponding number of steps.

localminimum commented 6 years ago

If you have a GPU to train on, usually 60,000 steps would get you to the best performance which takes about 6~8 hours depending on which GPU you have.