luopeixiang / named_entity_recognition

中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)
2.14k stars 538 forks source link

potential fix in build_corpus #32

Open mikelty opened 3 years ago

mikelty commented 3 years ago

I changed an if-else block to try-except block and it worked. Machine: windows10, python3.7 also i need another sklearn package after i installed requirements.txt i think this is due to a syntactical difference between bmes format and a windows file reader. idk.

 def build_corpus(split, make_vocab=True, data_dir="./ResumeNER"):                                                           """读取数据"""                                                                                                          assert split in ['train', 'dev', 'test']                                                                                                                                                                                                        word_lists = []                                                                                                         tag_lists = []                                                                                                          with open(join(data_dir, split+".char.bmes"), 'r', encoding='utf-8') as f:                                                  word_list = []                                                                                                          tag_list = []                                                                                                           for line in f.readlines():                                                                                                  try:                                                                                                                        word, tag = line.strip('\n').split()                                                                                    word_list.append(word)                                                                                                  tag_list.append(tag)                                                                                                except:                                                                                                                     word_lists.append(word_list)                                                                                            tag_lists.append(tag_list)                                                                                              word_list = []                                                                                                          tag_list = []