用户自己添加新的词性的问题

liuyunfanng commented 8 years ago

现在发现1.2.10给出的用户自定义添加词性的demo不适用于NLPTokenizer.segment(text)分词，还没找出原因，希望能帮忙看一下，谢谢！

hankcs commented 8 years ago

请给出触发代码

liuyunfanng commented 8 years ago

DemoCustomNature.java中 // 我们可以动态添加一个 pcNature = Nature.create("np"); System.out.println(pcNature); // 可以将它赋予到某个词语 LexiconUtility.setAttribute("苹果电脑", pcNature); // 或者 LexiconUtility.setAttribute("苹果电脑", "np 1000"); // 它们将在分词结果中生效 List termList = HanLP.segment("苹果电脑可以运行开源阿尔法狗代码吗"); 上面这段代码是可行的但是如果termList = NLPTokenizer.segment("苹果电脑可以运行开源阿尔法狗代码吗");就会返回错误 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 148 at com.hankcs.hanlp.algoritm.Viterbi.compute(Viterbi.java:121) at com.hankcs.hanlp.seg.WordBasedGenerativeModelSegment.speechTagging(WordBasedGenerativeModelSegment.java:531) at com.hankcs.hanlp.seg.Viterbi.ViterbiSegment.segSentence(ViterbiSegment.java:118) at com.hankcs.hanlp.seg.Segment.seg(Segment.java:454) at com.hankcs.hanlp.tokenizer.NLPTokenizer.segment(NLPTokenizer.java:37) at com.hankcs.demo.DemoCustomNature.main(DemoCustomNature.java:50) 我怎么看懂，是生成矩阵的原因吗

hankcs commented 8 years ago

[苹果电脑/np, 可以/v, 运行/vn, 开源/v, 阿尔法/nrf, 狗/n, 代码/n, 吗/y] 已经修复，用版本库里的最新代码。

cwj commented 5 years ago

from pyhanlp import *

def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮")

def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))

if name == "main": add_dictionary() keyword_extract()

结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

hankcs commented 5 years ago

from pyhanlp import *

def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮")

def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))

if name == "main": add_dictionary() keyword_extract()

结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

cwj commented 5 years ago

from pyhanlp import * def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮") def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰")) if name == "main": add_dictionary() keyword_extract() 结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr] 在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

hankcs commented 5 years ago

from pyhanlp import * def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮") def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰")) if name == "main": add_dictionary() keyword_extract() 结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr] 在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。如果还有问题，欢迎重开issue。

cwj commented 5 years ago

from pyhanlp import * def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮") def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰")) if name == "main": add_dictionary() keyword_extract() 结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr] 在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

hankcs commented 5 years ago

from pyhanlp import * def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮") def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰")) if name == "main": add_dictionary() keyword_extract() 结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr] 在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

你需要等下一个版本，或者自行编译jar并替换pyhanlp中的jar。

cwj commented 5 years ago

from pyhanlp import * def add_dictionary(): CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary") CustomDictionary.add("攻城狮") def keyword_extract(): NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注 print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰")) if name == "main": add_dictionary() keyword_extract() 结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr] 在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

你需要等下一个版本，或者自行编译jar并替换pyhanlp中的jar。

好的，谢谢啦！不过我发现从上个版本到现在这个文件修改的地方挺多的，请问你们下个版本大概什么时候发布呢？

hankcs / HanLP

用户自己添加新的词性的问题 #271