hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

Performance Drop between Keras and PyTorch on Medieval French #56

Closed PonteIneptique closed 7 years ago

PonteIneptique commented 7 years ago

Based on the data set of @Jean-Baptiste-Camps : https://docs.google.com/spreadsheets/d/1uSnLrkouxCkHIzZqr0nTR7-u75WuW2dtXs-Opuywl7A/edit?usp=sharing

Not all epoch have been run. Before I used to run only 100 epochs on Chrestien. Maybe I have badly set the max_len_lemma (?). But it clearly shows superiority of Keras model over PyTorch. It outperforms it starting epoch 25.

emanjavacas commented 7 years ago

Thanks for the report. I can't understand the file very well, but it looks like an effect of using different optimizers and not doing any hyperparameter tuning (which you will have to do for a difference to be relevant). I also assume this is without using the generation approach to lemmatization, which is where the PyTorch model should shine. For now we should perhaps focus on debugging the other issues to role out bugs in the code?

PonteIneptique commented 7 years ago

Config file is here :

# Configuration file for the Pandora system
[global]
nb_encoding_layers = 2
nb_dense_dims = 1000
batch_size = 100
nb_left_tokens = 2
nb_right_tokens = 1
nb_embedding_dims = 100
model_dir = models/chrestien
postcorrect = False
include_token = True
include_context = True
include_lemma = label
include_pos = True
include_morph = False
include_dev = True
include_test = True
nb_filters = 150
min_token_freq_emb = 5
filter_length = 3
focus_repr = convolutions
dropout_level = 0.15
nb_epochs = 150
halve_lr_at = 75
max_token_len = 20
min_lem_cnt = 1
model = PyTorch
max_lemma_len = 32
emanjavacas commented 7 years ago

Still the config file doesn't cover optimization method, learning rate, learning rate schedule, etc... Once we have solved the issues I can have a go and try to optimize the model on your data (if you want).

mikekestemont commented 7 years ago

perhaps, to compare (and also for unit testing?), we should add a small open source corpus that we can ship with the repo?

On Wed, Oct 4, 2017 at 10:12 AM, Enrique Manjavacas < notifications@github.com> wrote:

Still the config file doesn't cover optimization method, learning rate, learning rate schedule, etc... Once we have solved the issues I can have a go and try to optimize the model on your data (if you want).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/56#issuecomment-334081614, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL9Pjp_8Pc7-6CGNIyijLg-dlsXKtks5soz4HgaJpZM4PtNGf .

PonteIneptique commented 7 years ago

I think the Chrestien corpus is open source, but you'll have to ask Mr @Jean-Baptiste-Camps

PonteIneptique commented 7 years ago

One of the thing we could do is have a repository for demo corpora ...

mikekestemont commented 7 years ago

yes, and especially for trained models, so that it becomes useable for ppl without machines for the heavy lifting during training: a "model zoo" as they call it.

On Wed, Oct 4, 2017 at 10:17 AM, Thibault Clérice notifications@github.com wrote:

One of the thing we could do is have a repository for demo corpora ...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/56#issuecomment-334082707, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL2zdCU9iMymAvrFs-J151gLyHNeWks5soz8kgaJpZM4PtNGf .

Jean-Baptiste-Camps commented 7 years ago

Hi everyone, I had started to write an issue on this exact same topic yesterday, because I noticed the performance drop as well on another corpus I have (old provençal), but did not find time to finish writing it.

It actually isn't just a problem of performance, but also of training behaviour: with PyTorch, instead of the steady improvement we have with Keras, it happens that after some number of epochs, the evolutions reverses, and each model becomes less efficient than the previous one, and more overtraining seems to happen (see curve). montferrand_03

I see now that it is too early to benchmark, as I suspected, but here it is anyway.

Keras

With Keras, scores did almost always improve from one epoch to the next, and rarely in the other way.

-> epoch  100 ...
Epoch 1/1
25845/25845 [==============================] - 19s - loss: 1.2243
::: Train scores (lemmas) :::
+       all acc: 0.9291545753530663
+       kno acc: 0.9291545753530663
+       unk acc: 0.0
::: Dev scores (lemmas) :::
+       all acc: 0.8922600619195047
+       kno acc: 0.9444070080862533
+       unk acc: 0.3015267175572519
::: Test scores (lemmas) :::
+       all acc: 0.8510835913312693
+       kno acc: 0.9246004169562196
+       unk acc: 0.25 

PyTorch

-> epoch  100 ...
25850/25850 [==============================] - 239s - loss: 0.5838 - lemma_out_loss: 0.5838
::: Train Scores (lemma) :::
+       all acc: 0.8945637454053008
+       kno acc: 0.8945637454053008
+       unk acc: 0.0
::: Dev Scores (lemma) :::
+       all acc: 0.7934984520123839
+       kno acc: 0.8517520215633423
+       unk acc: 0.13358778625954199
::: Test scores (lemma) :::
+       all acc: 0.7631578947368421
+       kno acc: 0.8366921473245309
+       unk acc: 0.16193181818181818

Config

# Parameter file

[global]
nb_encoding_layers = 2
nb_dense_dims = 1000
batch_size = 50
nb_left_tokens = 2
nb_right_tokens = 1
nb_embedding_dims = 100
model_dir = models/montferrand
postcorrect = False
nb_filters = 100
filter_length = 3
focus_repr = convolutions
dropout_level = 0.15
include_token = True
include_context = True
include_lemma = label
include_pos = False
include_morph = False
include_dev = True
include_test = True
nb_epochs = 100
halve_lr_at = 75
max_token_len = 24
max_lemma_len = 19
min_token_freq_emb = 3
min_lem_cnt = 1
char_embed_dim = 50
PonteIneptique commented 7 years ago

Closing this, as it is clear the hyper-parametrisation is much much different between both implementations. The fun thing is that PyTorch seems to reach, for label not generate, much much faster local minima (but at good %age..)