jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
1.79k stars 457 forks source link

为什么输出后数字变成了“0” #21

Closed lwy1111111 closed 6 years ago

lwy1111111 commented 6 years ago

语句如下: 输入语句:在全国高等医药教材建设研究会和卫生部教材办公室的指导和组织下,在第6版的基础上,经过编委们的精心修改、编撰,完成了本教材的第7版。 通过使用训练出来的模型文件(xxx.model),使用decode后: 输出:在全国高等医药教材建设研究会和卫生部教材办公室的指导和组织下,在第0版的基础上,经过编委们的精心修改、编撰,完成了本教材的第0版。 我可以保证的的是词嵌入和字典里面都是有数字vec的。 请问:为什么输出后数字变成了“0”?

jiesutd commented 6 years ago

你好,是的,这是NLP模型常见的处理方式。就是将所有数字都normalized 成 0。一般会有一定的效果提升。如果你不想这样处理,可以设置这里 https://github.com/jiesutd/LatticeLSTM/blob/7c1bf5be8828a097697ab4b4fade8cdb21a8a388/utils/data.py#L23

Biaocsu commented 5 years ago

Hi, @jiesutd, I also meet the same problem like above. However, as I tried what you said, self.number_normalized = False it does not work out, the result is the same. So, is there anything else I need to change? Here is my result of resume datas: 个 O 人 O 简 O 历 O

个 O 人 O 信 O 息 O

求 O 职 O 意 O 向 O : O J B-ORG a M-ORG v M-ORG a E-ORG

方 O 向 O

姓 O 名 O : O 张 O 盼 O

电 B-NAME 话 E-NAME : O 0 O 0 O 0 O

性 O 别 O : O 男 O

邮 O 箱 O : O 0 O 0 O 0 O 0 O 0 O 0 O 0 O 0 O 0 O 0 O @ O q B-ORG q M-ORG . M-ORG c M-ORG o M-ORG m M-ORG

出 O 生 O 年 O 月 O : O 0 O 0 O 0 O 0 O / O 0 O / O 0 O

Biaocsu commented 5 years ago

Is it beacuse I train the model without changing it ? I will train the model again.

lwy1111111 commented 5 years ago

YES,You need to change the self.number_normalized = True property in the data.py file

Is it beacuse I train the model without changing it ? I will train the model again.

Yes,You need to change the self.number_normalized = True property in the data.py file