Closed riccardopinosio closed 1 year ago
Hi @riccardopinosio ,
I am also interested in incorporating coreferee with my custom NER spacy model. My problem is the entities are quite different from the default spacy model itself so using pretrainedncorefree or neuralcoref would not be a good choice right?
So, my solution would be to annotate small dataset and finetune the existing model on it right? Any suggestions
Named-entity labels are one of many feature types that Coreferee uses as neural-network inputs, with one neuron per type and a fixed mapping from labels to neurons. This means that if you are using a custom NER model, however close this model is to the published standard model, it will make sense to train a custom Coreferee model to ensure you are achieving the best accuracy possible.
For English this should be both straightforward and quick:
sh/download_corpora.sh
within the repo; on other operating systems, you may need to write your own script to download and convert the same data.config.cfg
.train
command in point 9 of the "new language" instructions.@Tanmay98, you don't specify if your custom NER spaCy model is for English or for another language. If English, please follow the above instructions; if for another language, please get back to me.
Hi @richardpaulhudson , Thanks for your response. I am a little confused.
So, I have a custom spacy model for NER in English language.
My only question is since the Litbank dataset has entities like person, organization, etc. for NER whereas my spacy model has different Entities. So wouldnt I want to finetune the coreferee model for some newly annotated dataset according to my NER entities.
Also, Since then I have tried annotating small dataset for coreference using Bran software which was mentioned by the Litbank guyz but their documentation is not very clear on how to use that
So, I used LabelStudio to annotate and then wrote code which convert the json to bran format and finally using litbank loaderclass for training.
My last question is according to the guidelines mention in coreferee repo, am I finetuning using the train
command or training a whole new model?
Thanks again in advance !
Unfortunately this is all quite confusing.
train
command is designed to train a new model rather than to fine-tune an existing one, but I would recommend using the existing/standard training corpora together with any new training corpora. This should have a similar overall effect to fine-tuning.Two other important points I forgot to mention earlier:
PERSON
as one of several ways of identifying human referents (which are referred to using different pronouns from non-animate referents). If this is relevant to your use case, it is important to use the label PERSON
in your entity model or to modify the code in your codebase in https://github.com/explosion/coreferee/blob/master/coreferee/lang/en/language_specific_rules.py to fit your entity model (just search for "PERSON"
.@richardpaulhudson Thanks for the wonderful explanation. Now, it makes sense to me. So, since I have already annotated some training data, I will combine it with the existing Litbank corpora and try to train on the whole.
One thing, my custom spacy model just has one entity "ACT", as in the legal acts, for NER. So, I just want the coreferences of the act to be detected.
For example,
Bail under the Narcotic Drugs and Psychotropic Substances Act 1985 (‘NDPS Act’). The court while considering the application for bail with reference to Section 37 of the Act is not called upon to record a finding of not guilty.
-So, here my spacy model currently detects [ACT] Narcotic Drugs and Psychotropic Substances Act 1985 [ACT] but I also want Section 37 of the Act to be detected as part of the parent entity.
This looks doable with combining the Litbank and sample annotated dataset, I think probably
Yes, what will be important for this use case will be defining act
and any other relevant nouns as mapping to ACT
in the entity noun dictionary.
@richardpaulhudson Thanks, I updated the config with my custom spacy model and also updated the entity noun dictionary. After training the model. Now I am running the command to install the new coreferee model but I am facing the below issue even after running from the root coreferee directory where the models dir is present
Hi , it's workin now i had to just run python3 -m coreferee install en
and it updated for the new coreferee model so everything seems to work.
Thanks!!
Hello,
I would like to use coreferee with a custom spacy model that is a slight variation of the en_core_web_lg version 3.4.1 (it's basically the same model that has been trained to recognize one additional entity type using the standard spacy training process).
Trying to add coreferee to the trained pipeline with .add_pipe fails with a model version not supported error. In the readme it says that I'm supposed to train a new coreferee model for this custom model, however I would like to essentially use the same model for en_core_web_lg as my custom model is very similar. Is there any way to just lift that coreferee model for use with a custom spacy model?