How to load new trained models in Spacy ?

valdicarlo commented 6 years ago

Hi,

I successfully trained a new Italian coreference model following these instructions. The only output of the training process I have, seems to be the checkpoints models, in particular the three files _best_modeltoppairs, _best_modelranking, *_best_modelallpairs.

Now I would like to use my new trained model with Spacy. Here I find the instructions to load NeuralCoref as a spaCy pipeline component, but the input files (that can be downloaded from here: https://github.com/huggingface/neuralcoref-models/releases/download/bare_weights-3.0.0/neuralcoref.tar.gz) are Spacy Vectors "tuned_vectors" and "static_vectors" and two Thinc models "pairs_model" and "single_model".

How can I convert the output files from the training process to the input files of the Spacy coreference component?

I'm missing something....

Thanks!

jdliu18 commented 6 years ago

Hi,

I have a same problem here. Thanks.

msalameh83 commented 6 years ago

same here, I have the same problem, appreciate your help

RodSernaPerez commented 5 years ago

Any solution to this problem?

kaushik88 commented 5 years ago

Any solution to this?

thomwolf commented 5 years ago

This should be fixed in the new release (4.0) with SpaCy 2.1+ (which is also on PyPI now). NeuralCoref should be compatible with any (English) SpaCy model now. Please open a new issue if you still experience issues.

KalidindiMounika commented 5 years ago

Can any one please share italian connl dataset.

KalidindiMounika commented 5 years ago

@valdicarlo can you please paste link of italian corpus to create a coref model.

Thanks in Advance

valedica commented 5 years ago

Sorry, it’s a proprietary dataset.

KalidindiMounika commented 5 years ago

@valdicarlo were you able to build Italian or any other natural language coreference(except English, Chinese & Arabic) resolution using neuralcoref

johanelkjaer commented 4 years ago

@valedica did you end up finding a solution? I am in the same pickle, having built a custom Danish model, but unsure on how to use it with Spacy. Any help would be greatly appreciated!

EricLe-dev commented 4 years ago

@thomwolf I'm training the neuralcoref model for Dutch language using SoNar corpus, at first, I used this script to convert the MMAX format to CONLL format. After that, I trained a w2v model to prepare the static_word_embedding files. I have a few questions that I could not answer myself and I could not also find anywhere else.

I don't know what tuned_word_embedding files are, whenever I ran the conllparser.py, it just complained about missing those files. Looking deeper to the original tuned_word_embedding, I could see that it is similar to the static_word_embeddings, however, there are words that appear in both static and tuned word embeddings, and there are words that only appear in tuned_word_embeddings. For this reason, I just used exactly the same word embeddings file for both static and tuned. It seemed to work (at least not throw any complaint but I'm not sure if it work or not).
I have no idea how you constructed the MISSING and the UNK tokens in those static/tuned word embeddings.
When I run the train code, it ran quite well at first but then display this error (I think it's from PERL):

I came across many topics as well as posting questions on many threads, however I still got no help or guidance. Thank you so much for any help that any of you can provide.

With best regards, Eric

tidoe commented 4 years ago

I am working with version 4.0 and still have the problem as @valdicarlo. After training a model for a new language, I only have files ending with _best_modeltoppairs, _best_modelranking, *_best_modelallpairs. But spacy.load("mymodel_best_modelranking") does not work. What should I do to load the model / add it to the pipeline of another model?

Best and thanks Tillmann

huggingface / neuralcoref

How to load new trained models in Spacy ? #79