Hi @jeniyat , when I followed readme files, I firstly encountered the problem: "word_to_id.json" file doesn't exist. I don't konw this file is auto generated or should put here in advance. To solve this problem, I notice in E2ESofrNER.py there is a ctc_classifier, vocab_size, word_to_id, id_to_word, word_to_vec, features= train_ctc_model(train_file, test_file) sentence and a word_to_id list is generated but is not saved as a json file. Then I tried to dump this list to json but I found there are no [CLS][SEP][UNK] and ***PADDING*** of word_id_pad = word_to_id["***PADDING***"] in utils_seg.py.
These are just my try, maybe I have done wrong. I read the source code and still can not find where word_to_id.json is generated.
Could you give me a help? Thank you very much.
Hi @jeniyat , when I followed readme files, I firstly encountered the problem: "word_to_id.json" file doesn't exist. I don't konw this file is auto generated or should put here in advance. To solve this problem, I notice in
E2ESofrNER.py
there is actc_classifier, vocab_size, word_to_id, id_to_word, word_to_vec, features= train_ctc_model(train_file, test_file)
sentence and a word_to_id list is generated but is not saved as a json file. Then I tried to dump this list to json but I found there are no[CLS]
[SEP]
[UNK]
and***PADDING***
ofword_id_pad = word_to_id["***PADDING***"]
in utils_seg.py. These are just my try, maybe I have done wrong. I read the source code and still can not find where word_to_id.json is generated. Could you give me a help? Thank you very much.