Closed YanLiang1102 closed 6 years ago
ine 278, in Tok2Vec
glove = StaticVectors(pretrained_vectors, width, column=cols.index(ID))
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/thinc/neural/_classes/static_vectors.py", line 41, in __init__
vectors = self.get_vectors()
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/thinc/neural/_classes/static_vectors.py", line 52, in get_vectors
return get_vectors(self.ops, self.lang)
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/thinc/extra/load_nlp.py", line 19, in get_vectors
nlp = get_spacy(lang)
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/thinc/extra/load_nlp.py", line 11, in get_spacy
SPACY_MODELS[lang] = spacy.load(lang, **kwargs)
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/spacy/__init__.py", line 15, in load
return util.load_model(name, **overrides)
File "/home/yan/spacyOU/spacy-vir/lib/python3.5/site-packages/spacy/util.py", line 119, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'ar_model.vectors'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Just to remind myself all of these experiments are done on Manchester.
@ahalterman it looks like the pretrained ner model does not help at all, see the comments above, using a blank language model with no pretrained ner model performs better than the model has an ner model in it. And also I updated the steps in the read me file. The interesting thing is the model with Ner at first works much better than the empty language model, later the empty language model catched up....
python3 -m prodigy ner.batch-train arabicner /home/yan/arabicner/Arabic-NER/xx_raw_fasttext_model_1 --eval-split 0.2
so used the pruned trained model from LDC data and then directly trained on prodigy labelled data without the rehearsal step. (I wonder anything wrong with the rehearsal code maybe?) why it gets higher @ahalterman
python3 -m prodigy ner.batch-train arabicner /home/yan/arabicner/Arabic-NER/xx_raw_fasttext_model_1 --eval-split 0.2
trained with an empty nermodel with pretrainde vector and with only prodigy labelled data
python3 -m prodigy ner.batch-train augmented_for_training_2 /home/yan/arabicner/Arabic-NER/xx_raw_fasttext_model --eval-split 0.2
empty model trained with only LDC data
trained result on all ontoNotes tags like 40k tokens with prodigy, previously we only use 4000 to reheasal, our accuracy get t 72.1%!!!
the screenshot uploaded today is in Chinese training data....
two things to try: 1.prune the vector before use spacy to train and put the output vectors (it output a language model from the pruning, copy and paste the vectors to the model) get error when using this model to train the mixed in data. --failed