hplt-project / sacremoses

Python port of Moses tokenizer, truecaser and normalizer
MIT License
486 stars 59 forks source link

Possible to retrain/keep training an existing model? #102

Open petulla opened 4 years ago

petulla commented 4 years ago

Hi

Given a loaded model, is it possible to train it with more data?

alvations commented 4 years ago

Which model? Do you mean the truecasing model? Other than that, there's no real model training in sacremoses , it's lots of regex rules writing and testings =)

petulla commented 4 years ago

I meant a model already trained with sacremoses.. in other words, can you load an existing model and keep training (add more rules).

alvations commented 4 years ago

May I ask which preprocessing task are you referring to in sacremoses? The truecaser?

For other tasks, there's no training involved and the rules are manually defined 😅

petulla commented 4 years ago

yep truecaser.

to clarify

let's say i load some text into sacremoses, i train for truecasing.

then two days later, i have some new text. i want to update the model.

i want to keep training the existing model with new text rather than start from scratch.

alvations commented 4 years ago

P/S: I'm thinking about how to put this feature in. It's not hard but just have to think a little about the user's usage logic =)

I'm a little busy these couple of days. But please keep this issue open, I'll look into it because I think it's worth a try.