hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

Probable bug with morph dev evaluation with single label ? #77

Open Jean-Baptiste-Camps opened 7 years ago

Jean-Baptiste-Camps commented 7 years ago

I've encountered a likely bug in morph evaluation with single label (does not seem to happen with multilabel, as far as I can see), using Keras and convolutions. Here is the evaluation of an epoch :

-> Epoch  8 ...
Epoch 1/1
2548321/2548321 [==============================] - 7185s - loss: 3.3718 - pos_out_loss: 0.3315 - morph_out_loss: 3.0402
::: Train scores (pos) :::
+       all acc: 0.9746774444820727
+       kno acc: 0.9746774444820727
+       unk acc: 0.0
::: Dev scores (pos) :::
+       all acc: 0.9631442375958935
+       kno acc: 0.9661207248768655
+       unk acc: 0.8395028731792062
::: Test scores (pos) :::
+       all acc: 0.9623036616339998
+       kno acc: 0.9650712303839143
+       unk acc: 0.8473149879775581
::: Train scores (morph) :::
+       all acc: 0.5053849181480669
+       kno acc: 0.5053849181480669
+       unk acc: 0.0
::: Dev scores (morph) :::
+       all acc: 0.07532624198138992
+       kno acc: 0.07647367286601746
+       unk acc: 0.02766270212481625
::: Test scores (morph) :::
+       all acc: 0.4985762230824535
+       kno acc: 0.5029948590664013
+       unk acc: 0.3149879775581085

I wonder what causes this strange behaviour…

emanjavacas commented 6 years ago

What is exactly the issue (apart from the weird but possible distribution of morph scores)?

Jean-Baptiste-Camps commented 6 years ago

Maybe this needs further testing, but I think the issue is precisely the weird distribution of morph scores. I still have that kind of difference after 100 epochs, and on two different corpora.

Jean-Baptiste-Camps commented 6 years ago

Not impossible, but that would be improbable, no ?

emanjavacas commented 6 years ago

Try comparing different parameter combinations, and also using the pytorch model. If its indeed a code bug, chances are high that you see similar behavior on other datasets as well. If you still see it, i’d ask for a replicable example. On Fri 24. Nov 2017 at 22:28, Jean-Baptiste-Camps notifications@github.com wrote:

Not impossible, but that would be improbable, no ?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/77#issuecomment-346897802, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6How0YoW-2WTZbCvTlh72huFRG29C1ks5s5zTxgaJpZM4QB6SI .