natasha / natasha-spacy

SpaCy official Russian model proposal
MIT License
31 stars 4 forks source link

Extending NER #1

Open sskorol opened 3 years ago

sskorol commented 3 years ago

Hi Alex, thanks for the model! Just a quick question regarding its extension.

I have a code based on the official guide for NER training. Would it be enough just to run it against your model, update meta.json and run spacy package to create an extended version ready to be installed on any system via pip? I'm asking because there are lots of additional steps listed in the "Training" section of README. So I'm a bit confused if I need to follow all of them to extend your model with custom data.

Thanks, Sergey

@kuk

kuk commented 3 years ago

Yes, I think it should work. Could you please report wheather it actually improves the accuracy, in case you try?

sskorol commented 3 years ago

It works for me. I mean I can package and link it with spacy as a default ru model (tried on 2.3.1 version). And of course, my entities are correctly recognized as well.

But for some reason, I don't see results for my custom entities in the output meta.json. Well, entities are present in ner section, but missing in accuracy.ents_*. Should I pass some additional metadata while saving an updated model? Or these stats appear only while training via cli? Or I need to retrain everything from scratch to get the overall accuracy?

kuk commented 3 years ago

Do you have validation dataset? Do you pas both train and val datasets to training procedure? We train ru_core_news_md via cli spacy train. Actually maybe better to SpaCy repo issues.

wheather it actually improves the accuracy, in case you try?

Did you notice any difference in accuracy between training from scratch and using ru_core_news_md?

kuk commented 3 years ago

Could you share any commands or source code that you used for finetunning on your data? So it resolves this issue