jeniyat / StackOverflowNER

Source Code and Data for Software Domain NER
MIT License
145 stars 37 forks source link

Crash from an unknown key of the BERT NER Model #11

Closed jyz-1201 closed 2 years ago

jyz-1201 commented 2 years ago

WeChat Image_20211111093940

We're encountering the exception in the image above. Could anyone tell me how to fix it please?

We doubt it from the error version of the two vocabulary files below. There are two lines like

parameters_ctc['train_file']="/data/jeniya/STACKOVERFLOW_DATA/CTC/data/train_updated.tsv"
parameters_ctc['test_file']="/data/jeniya/STACKOVERFLOW_DATA/CTC/data/test_updated.tsv"

in the file "code/BERT_NET/utils_ctc/config_ctc.py", however these two updates files cannot be downloaded anywhere.

If we alter these files with "test_v2.tsv" and "train.tsv" in the folder "data_ctc" separately which could be downloaded from the url of google drive in the Readme file, the above exception would be thrown.

LGDDouble commented 2 years ago

Hi, @jeniyat, thank you for your generous sharing of codes and data, and I think Stack OverflowNER is an excellent work that can be very helpful in NER in the software engineering domain. I face the same problems with @jyz-1201, and I want to know if it is because I replaced the two files “train_updated.tsv“ and ”test_updated.tsv” with "train.tsv" and "test_v2.tsv". And I can't get the download link of “train_updated.tsv“ and ”test_updated.tsv” in the google drive link.

jeniyat commented 2 years ago

You can find these files after unzippin the data_ctc.zip from here: https://drive.google.com/drive/folders/1iEEMr2DYofulK2F5pSErOPf5ggrEqtJt?usp=sharing

LGDDouble commented 2 years ago

Sorry, @jeniyat I download data_ctc.zip from the link just now and cannot find "train_updated.tsv“ and ”test_updated.tsv”, the content of data_ctc.zip is as follows: image

jyz-1201 commented 2 years ago

I'm sorry, but I have to say that after downloading data_ctc.zip from your link @jeniyat, I could only find the same content with @LGDDouble, in which there is no "train_updated.tsv“ and ”test_updated.tsv”.