hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

Early Stopping #86

Open emanjavacas opened 6 years ago

emanjavacas commented 6 years ago

We need to implement some kind of early stopping. Given the usually small size of the datasets, it's pretty easy to start overfitting damaging dev performance. I've noticed this already after less than 10 epochs on the geste dataset. Since we do multi-task learning, we should implement a weighting scheme to decide when to stop. This could be left for the user to decide, depending on which of the tasks is currently more important.

mikekestemont commented 6 years ago

+1

Jean-Baptiste-Camps commented 6 years ago

Just to say: the Geste dataset here is a very small sample, but I notice this also on the larger Chrestien corpus after 5 or so epochs. chrestien3_03

emanjavacas commented 6 years ago

Actually, your dev loss keeps going up, which means it's still ok. You might want to try to increase the dropout rate or decrease the total number of parameters in the model. Generalization is one of the main problems in machine learning, there are many recipes for it.

2017-10-29 14:25 GMT+01:00 Jean-Baptiste-Camps notifications@github.com:

Just to say: the Geste dataset here is a very small sample, but I notice this also on the larger Chrestien corpus after 5 or so epochs. [image: chrestien3_03] https://user-images.githubusercontent.com/1204247/32144161-f0604c26-bcb4-11e7-838a-d856fdb7dcbb.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/86#issuecomment-340262234, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho4EvDfWRoHfb8bg2a-PFgN8QzhK0ks5sxHy7gaJpZM4QJ5Jz .

-- Enrique Manjavacas.