jeniyat / StackOverflowNER

Source Code and Data for Software Domain NER
MIT License
145 stars 37 forks source link

KeyError in /code/BERT_NER/utils_fine_tune/labels_seg.txt` #5

Open cuevasclemente opened 3 years ago

cuevasclemente commented 3 years ago

Hi,

I'm trying to run E2E_SoftNER.py. I think I have been able to resolve the references to the locations of a lot of the models and files that are associated with the repo, however, I'm getting an error, here's the traceback:

Exception has occurred: KeyError
8
  File "/Users/clemente/src/python/github/StackOverflowNER/code/BERT_NER/softner_segmenter_preditct_from_file.py", line 298, in evaluate
    preds_list[i].append(label_map[preds[i][j]])
  File "/Users/clemente/src/python/github/StackOverflowNER/code/BERT_NER/softner_segmenter_preditct_from_file.py", line 638, in predict_segments
    result, predictions = evaluate(args, model, tokenizer, labels, pad_token_label_id, mode="", path=input_file)
  File "/Users/clemente/src/python/github/StackOverflowNER/code/BERT_NER/E2E_SoftNER.py", line 186, in Extract_NER
    softner_segmenter_preditct_from_file.predict_segments(segmenter_input_file, segmenter_output_file)
  File "/Users/clemente/src/python/github/StackOverflowNER/code/BERT_NER/E2E_SoftNER.py", line 206, in <module>
    Extract_NER(input_file)

It looks like there might be something off with what this code expects for the format of './utils_fine_tune/labels_seg.txt'. Looking at label_map here, it is just a dictionary that doesn't have a key for 8:

> label_map
{0: 'B-Name', 1: 'O', 2: 'CTC_PRED:0', 3: 'CTC_PRED:1', 4: 'md_label:O', 5: 'md_label:Name'}

whereas preds here seems to be an array with a pretty high number of values:

> preds
array([[ 0,  8, 13, ..., 10,  1,  0],
       [ 4, 13,  1, ...,  7,  3,  9],
       [ 9,  2,  0, ...,  9,  1,  9],
       ...,
       [ 0,  2, 13, ...,  0, 12,  0],
       [ 4,  2,  5, ..., 10,  5,  1],
       [ 4,  2,  6, ...,  9,  9,  9]])

Everything in the utils_fine_tune directory came from the megaupload link you provided, so it could be possible that there was some issue with either the archive, or the data.

If you find the time to take a look at this issue, thanks very much for contributing this code to the community and please let me know if there is anything else you might be interested in from me to help debug or further understand this issue. Hopefully it's just some misunderstanding on my end.

Ramya0694 commented 3 years ago

Hi Jeniyat,

Thank you for the contribution! I'm facing the same error as @cuevasclemente

BERT_NER/softner_segmenter_preditct_from_file.py", line 298, in evaluate preds_list[i].append(label_map[preds[i][j]]) KeyError: 11

If you could help understand this issue, it would be really helpful!

EeshitaBiswas commented 3 years ago

@cuevasclemente Hi, I am also trying to run E2E_SoftNER.py. You mentioned that you were able to resolve the references to the locations for many models and files. There might be a problem with my understanding, but I'm getting this error: "ValueError: /data/jeniya/STACKOVERFLOW_DATA/POST_PROCESSED/fasttext_model/fasttext.bin cannot be opened for loading!" since this model is not present when called from "StackOverflowNER-master/code/BERT_NER/utils_ctc/prediction_ctc.py", line 30 fasttext_model = fasttext.load_model('/data/jeniya/STACKOVERFLOW_DATA/POST_PROCESSED/fasttext_model/fasttext.bin')

I tried to load the other models in -> StackOverflowNER-master/resources/pretrained_word_vectors/ folder, but get "... wrong file format" error. What models did you use and how did you load them?

cuevasclemente commented 3 years ago

Hi, I think the specific error you're referencing indicates to me that you need to change file locations that this code is looking for. You probably don't have a /data/jeniya/STACKOVERFLOW_DATA directory on your computer, so you would need to change where those files are pointing to on your local computer in prediction_ctc.py