liu-nlper / NER-LSTM-CRF

An easy-to-use named entity recognition (NER) toolkit, implemented the Bi-LSTM+CRF model in tensorflow.
347 stars 117 forks source link

使用你的数据训练是正常的,但是当我使用自己的数据时,确爆出如下的错误 #2

Closed linzhenpeng closed 7 years ago

linzhenpeng commented 7 years ago

数据(中间行是无意义的): 【 O O 拉手 O O 】 O O 您 O O 好 O O , O O 黄记 O B-commodityname 煌 O I-commodityname 中华 O I-commodityname 店 O I-commodityname 0 O I-commodityname 人餐 O E-commodityname 券号 O O 000000000 O S-order_arr 等 O O 0 O B-consumequantity 张 O E-consumequantity 券 O O 已于 O O 00 O B-date 日 O E-date 00 O B-time 时 O E-time 消费 O O , O O 拉手 O O 客服 O O : O O 0000000000 O O

【 O O 拉手 O O 】 O O 您 O O 好 O O , O O 黄记 O B-commodityname 煌 O I-commodityname 中华 O I-commodityname 店 O I-commodityname 0 O I-commodityname 人餐 O E-commodityname 券号 O O 000000000 O S-order_arr 等 O O 0 O B-consumequantity 张 O E-consumequantity 券 O O 已于 O O 00 O B-date 日 O E-date 00 O B-time 时 O E-time 消费 O O , O O 拉手 O O 客服 O O : O O 0000000000 O O

您 O O 于 O O 00 O B-date

只有前面两个数据是正常的,但是加最后一个数据时却出错了 ?? 所以感到特别疑惑?

Epoch 1 / 20: 0%| | 0/1 [00:00<?, ?it/s] Traceback (most recent call last): File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1139, in _do_call return fn(*args) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1121, in _run_fn status, run_metadata) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\contextlib.py", line 66, in exit next(self.gen) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1,8] = 155 is not in [0, 144) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_3, add_3)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:/PythonWorkSpace/nlp/lstm_crf/train.py", line 74, in main() File "D:/PythonWorkSpace/nlp/lstm_crf/train.py", line 70, in main data_dict=data_dict, dev_size=config['model_params']['dev_size']) File "D:\PythonWorkSpace\nlp\lstmcrf\model.py", line 240, in fit , loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 789, in run run_metadata_ptr) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 997, in _run feed_dict_string, options, run_metadata) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1132, in _do_run target_list, options, run_metadata) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1,8] = 155 is not in [0, 144) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_3, add_3)]]

Caused by op 'Gather_1', defined at: File "D:/PythonWorkSpace/nlp/lstm_crf/train.py", line 74, in main() File "D:/PythonWorkSpace/nlp/lstm_crf/train.py", line 67, in main path_model=config['model_params']['path_model']) File "D:\PythonWorkSpace\nlp\lstm_crf\model.py", line 73, in init self.build_model() File "D:\PythonWorkSpace\nlp\lstm_crf\model.py", line 160, in build_model self.loss = self.compute_loss() File "D:\PythonWorkSpace\nlp\lstm_crf\model.py", line 371, in compute_loss self.logits, self.input_label_ph, self.sequence_actual_length) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 155, in crf_log_likelihood transition_params) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 93, in crf_sequence_score transition_params) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 220, in crf_binary_score flattened_transition_indices) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1179, in gather validate_indices=validate_indices, name=name) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op op_def=op_def) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "D:\Anaconda\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1269, in init self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[1,8] = 155 is not in [0, 144) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_3, add_3)]]

liu-nlper commented 7 years ago

你好,是不是配置文件中的几个shape参数(包括label数)没有设置正确?

linzhenpeng commented 7 years ago

你好 好像还是不行

有一个地方您没处理好 预处理的 build_vocabulary方法 应该在while 结束后再执行一次 sequence_length_dict[sequence_length] += 1 不然就会少统计最后一个数据.

liu-nlper commented 7 years ago

你好,确实没处理好,谢谢指出问题~

linzhenpeng commented 7 years ago

抱歉 我没仔细看 nb_classes 没设置好 现在正常了 要用手工来修改感觉不大好..... 特别感谢您在百忙中抽空回答我的问题

liu-nlper commented 7 years ago

谢谢你的建议,我找时间改成自动生成配置文件~