hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

Generation broken? #99

Open mikekestemont opened 6 years ago

mikekestemont commented 6 years ago

There seems to have occurred a breaking change with the generate option for lemmatization along the road, which prevents me from helping out with #93. Basically, the lemmatization accuracy stays at zero without postcorrect (which isn't normal), at saturates at ~66% with postcorrect (which isn't normal either). Any suggestions? This might related to the fix by @PonteIneptique that took care of not explicitly testing on the train data during training. Does this fix require any changes to config files?

PonteIneptique commented 6 years ago

This means we have currently two (potentially unrelated) broken things :

I am not sure which change from me you are talking about ? If this relates to the external use of the "Logger" ( https://github.com/hipster-philology/pandora/pull/24/files ), if you see test results, even if they are equal to 0, it means this is tested. Plus, it's an opt-out parameter which means it should default to testing...

mikekestemont commented 6 years ago

I think these issues come from the same problem.

PonteIneptique commented 6 years ago

You did not specify which "model" you are using. PyTorch or Tensorflow ?

mikekestemont commented 6 years ago

By default I talk about tensorflow because pytorch isn't ready for testing yet. (What is the recommended way to set the model type/ backend in the config right now? Perhaps this is the issue.)

PonteIneptique commented 6 years ago

If you did not specify it, I think it defaults to tensorflow. But otherwise, here is an example : https://github.com/hipster-philology/pandora/blob/master/config_example/config_12c_pytorch.txt#L30

mikekestemont commented 6 years ago

Ok. Thanks. I was planning to do a major revision of the base soon, to avoid have to load the entire dataset in memory: the error would probably emerge if I do this revision. Are there any major PRs on the way that I should wait for?

On Wed, Nov 8, 2017 at 9:49 AM, Thibault Clérice notifications@github.com wrote:

If you did not specify it, I think it defaults to tensorflow. But otherwise, here is an example : https://github.com/hipster- philology/pandora/blob/master/config_example/config_12c_pytorch.txt#L30

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/99#issuecomment-342750368, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL2KgkRWao1v4Z-AxUZhPKPKgBw2fks5s0Wr9gaJpZM4QWBDM .

PonteIneptique commented 6 years ago

Not that I know of.

emanjavacas commented 6 years ago

The pytorch branch does generate lemmas and works (accuracy increases over training).

Jean-Baptiste-Camps commented 6 years ago

Indeed, from the tests I have made on a small test corpus, the behaviour is different between PyTorch (where it works) and Keras (where there seems to be a problem). Also, epoch time is a lot longer with Keras (x5/6), but that might be something else.

PonteIneptique commented 6 years ago

Hey @mikekestemont . Any update on your major revision ? I saw your new Pie repository, is it related to Pandora ? :)

mikekestemont commented 6 years ago

Yes, I am working with Enrique on a new approach to the tagging (at the level of complete sentences) in that new repository. However, the setup is so radically different than what we have now in Pandora that we first wanted to test the viability of the idea outside the main repository. We will keep you posted if anything comes out!