Not loading the right pre trained model

maheshmad commented 4 years ago

Followed the steps in setting the project, but I get the below error while trying to run a prediction.

Any hint?

2020-02-05 18:42:39,044 - INFO - allennlp.nn.initializers -    qp_matrix_attention._bias
2020-02-05 18:42:39,044 - INFO - allennlp.nn.initializers -    qp_matrix_attention._weight_vector
2020-02-05 18:42:41,383 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'lazy': False, 'pretrained_model': 'bert-base-uncased', 'question_length_limit': 50, 'skip_due_to_gold_programs': False, 'skip_instances': False, 'token_indexers': {'tokens': {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'}}, 'type': 'drop_reader_bert'} and extras set()
2020-02-05 18:42:41,384 - INFO - allennlp.common.params - validation_dataset_reader.type = drop_reader_bert
2020-02-05 18:42:41,384 - INFO - allennlp.common.from_params - instantiating class <class 'semqa.data.dataset_readers.drop_reader_bert.DROPReaderNew'> from params {'lazy': False, 'pretrained_model': 'bert-base-uncased', 'question_length_limit': 50, 'skip_due_to_gold_programs': False, 'skip_instances': False, 'token_indexers': {'tokens': {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'}}} and extras set()
2020-02-05 18:42:41,385 - INFO - allennlp.common.params - validation_dataset_reader.lazy = False
2020-02-05 18:42:41,385 - INFO - allennlp.common.params - validation_dataset_reader.pretrained_model = bert-base-uncased
2020-02-05 18:42:41,386 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'} and extras set()
2020-02-05 18:42:41,386 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.type = bert-drop
2020-02-05 18:42:41,386 - INFO - allennlp.common.from_params - instantiating class semqa.data.dataset_readers.drop_reader_bert.BertDropTokenIndexer from params {'pretrained_model': 'bert-base-uncased'} and extras set()
2020-02-05 18:42:41,387 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.pretrained_model = bert-base-uncased
2020-02-05 18:42:41,387 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.max_pieces = 512
2020-02-05 18:42:41,614 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
2020-02-05 18:42:41,718 - INFO - allennlp.common.params - validation_dataset_reader.relaxed_span_match = True
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.do_augmentation = True
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.question_length_limit = 50
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.only_strongly_supervised = False
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.skip_instances = False
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.skip_due_to_gold_programs = False
2020-02-05 18:42:41,720 - INFO - allennlp.common.params - validation_dataset_reader.convert_spananswer_to_num = False
2020-02-05 18:42:42,003 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
2020-02-05 18:42:42,094 - INFO - allennlp.common.registrable - instantiating registered subclass drop_demo_predictor of <class 'allennlp.predictors.predictor.Predictor'>
Traceback (most recent call last):
  File "/root/anaconda3/envs/py3/bin/allennlp", line 10, in <module>
    sys.exit(run())
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/run.py", line 18, in run
    main(prog="allennlp")
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 102, in main
    args.func(args)
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 227, in _predict
    manager.run()
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 206, in run
    for model_input_json, result in zip(batch_json, self._predict_json(batch_json)):
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 151, in _predict_json
    results = [self._predictor.predict_json(batch_data[0])]
  File "./semqa/predictors/demo_predictor.py", line 180, in predict_json
    instance = self._json_to_instance(inputs)
  File "./semqa/predictors/demo_predictor.py", line 100, in _json_to_instance
    passage_spacydoc = spacyutils.getSpacyDoc(cleaned_passage_text, spacy_nlp)
  File "./utils/spacyutils.py", line 38, in getSpacyDoc
    return nlp(sent)
  File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/spacy/language.py", line 435, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "pipes.pyx", line 397, in spacy.pipeline.pipes.Tagger.__call__
  File "pipes.pyx", line 442, in spacy.pipeline.pipes.Tagger.set_annotations
  File "morphology.pyx", line 312, in spacy.morphology.Morphology.assign_tag_id
  File "morphology.pyx", line 200, in spacy.morphology.Morphology.add
ValueError: [E167] Unknown morphological feature: 'ConjType' (9141427322507498425). This can happen if the tagger was trained with a different set of morphological features. If you're using a pretrained model, make sure that your models are up to date:
python -m spacy validate
2020-02-05 18:43:25,401 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmp1kf0l594

The file looks like this

{"passage":" Hoping to snap a two-game losing streak, the Falcons went home for a Week 9 duel with the Washington Redskins.  Atlanta would take flight in the first quarter as quarterback Matt Ryan completed a 2-yard touchdown pass to tight end Tony Gonzalez, followed by cornerback Tye Hill returning an interception 62 yards for a touchdown.  The Redskins would answer in the second quarter as kicker Shaun Suisham nailed a 48-yard field goal, yet the Falcons kept their attack on as running back Michael Turner got a 30-yard touchdown run, followed by kicker Jason Elam booting a 33-yard field goal. Washington began to rally in the third quarter with a 1-yard touchdown run from running back Ladell Betts.  The Redskins would come closer in the fourth quarter as quarterback Jason Campbell hooked up with tight end Todd Yoder on a 3-yard touchdown pass, yet Atlanta closed out the game with Turner's 58-yard touchdown run.","question":"How many yards was the shortest touchdown pass?"}

nitishgupta commented 4 years ago

Seems like an error caused by spacy trying to process the passage text. Make sure the file you're trying to do prediction on a json-lines formatted file, were each line is a json object with the keys -- "question" and "passage". Could you please post the file or a snippet of it here?

maheshmad commented 4 years ago

updated the original comment with the jsonl data.

nitishgupta commented 4 years ago

Runs fine on my machine. Could you verify that spacy is installed properly. I am using version 2.1.8, though it shouldn't really matter. Did you try running the command python -m spacy validate as suggested in the error log.

maheshmadhusudanan commented 4 years ago

@nitishgupta thanks for looking.... i had to update the en-core-web-lg to latest....and that fixed it.

====================== Installed models (spaCy v2.2.3) ======================
ℹ spaCy installation:
/root/anaconda3/envs/py3/lib/python3.6/site-packages/spacy

TYPE      NAME             MODEL            VERSION
package   en-core-web-lg   en_core_web_lg   2.1.0   --> 2.2.5
============================== Install updates==============================
python -m spacy download en_core_web_lg

nitishgupta commented 4 years ago

Great. Usually reading through the error log helps!

nitishgupta / nmn-drop

Not loading the right pre trained model #1