ipipan / spacy-pl

GNU General Public License v3.0
49 stars 7 forks source link

Tuning models #9

Open maciejbiesek opened 4 years ago

maciejbiesek commented 4 years ago

Is there any option to tune the models (NER, POS) you provided on own corpora?

ryszardtuora commented 4 years ago

There is an option to train existing models further on data using spaCy cli train command. Just provide the name, or link to the model as the argument of the --base-model parameter. You will need to convert your data to JSON format using convert command.

This should work, with the exception of POS tagger for morfeusz-based version, which is not a spaCy component.

maciejbiesek commented 4 years ago

So, to sum up, we can tune eg. the NER model and POS tagger in the simplest form, but we cannot bias models that are morfeusz-based to our specific data?

ryszardtuora commented 4 years ago

You can tune NER in the morfeusz version, but you cannot do so for its POS tagger.

If it is the morfeusz tokenization that you're after, I suppose you could retrain the basic tagger, and then use it as a component in the pipeline.

Adding the ability to retrain the morfeusz-version tagger, would require more work, but we will consider this.

maciejbiesek commented 4 years ago

Ok, I see, thank you :)