Open emanjavacas opened 6 years ago
+1
Just to say: the Geste dataset here is a very small sample, but I notice this also on the larger Chrestien corpus after 5 or so epochs.
Actually, your dev loss keeps going up, which means it's still ok. You might want to try to increase the dropout rate or decrease the total number of parameters in the model. Generalization is one of the main problems in machine learning, there are many recipes for it.
2017-10-29 14:25 GMT+01:00 Jean-Baptiste-Camps notifications@github.com:
Just to say: the Geste dataset here is a very small sample, but I notice this also on the larger Chrestien corpus after 5 or so epochs. [image: chrestien3_03] https://user-images.githubusercontent.com/1204247/32144161-f0604c26-bcb4-11e7-838a-d856fdb7dcbb.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/86#issuecomment-340262234, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho4EvDfWRoHfb8bg2a-PFgN8QzhK0ks5sxHy7gaJpZM4QJ5Jz .
-- Enrique Manjavacas.
We need to implement some kind of early stopping. Given the usually small size of the datasets, it's pretty easy to start overfitting damaging dev performance. I've noticed this already after less than 10 epochs on the geste dataset. Since we do multi-task learning, we should implement a weighting scheme to decide when to stop. This could be left for the user to decide, depending on which of the tasks is currently more important.