jeniyat / StackOverflowNER

Source Code and Data for Software Domain NER
MIT License
145 stars 37 forks source link

Consult of "word_to_id.json" #10

Closed Rvlis closed 3 years ago

Rvlis commented 3 years ago

Hi @jeniyat , when I followed readme files, I firstly encountered the problem: "word_to_id.json" file doesn't exist. I don't konw this file is auto generated or should put here in advance. To solve this problem, I notice in E2ESofrNER.py there is a ctc_classifier, vocab_size, word_to_id, id_to_word, word_to_vec, features= train_ctc_model(train_file, test_file) sentence and a word_to_id list is generated but is not saved as a json file. Then I tried to dump this list to json but I found there are no [CLS] [SEP] [UNK] and ***PADDING*** of word_id_pad = word_to_id["***PADDING***"] in utils_seg.py. These are just my try, maybe I have done wrong. I read the source code and still can not find where word_to_id.json is generated. Could you give me a help? Thank you very much.