hankcs / HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
https://hanlp.hankcs.com/
Apache License 2.0
33.84k stars 10.12k forks source link

IndexError: index out of range in self #1906

Closed 7777fsq closed 1 month ago

7777fsq commented 2 months ago

Describe the bug A clear and concise description of what the bug is.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

sen_tok=['哈哈' for i in range(513)] ner = hanlp.load(hanlp.pretrained.ner.MSRA_NER_ELECTRA_SMALL_ZH) ner_tok = ner(sen_tok, tasks='ner*')

Describe the current behavior A clear and concise description of what happened. 当运行ner = hanlp.load(hanlp.pretrained.ner.MSRA_NER_ELECTRA_SMALL_ZH) ner_tok = ner(sen_tok, tasks='ner*') # 得出NER结果时 如果token数量刚好是513,那么会出现下列错误: File "C:\Users\xxxxxx\.conda\envs\NER\Lib\site-packages\hanlp\layers\transformers\relative_transformer.py", line 94, in forward embed = self.weights.index_select(0, positions.long()).detach() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index out of range in self 感觉可能是因为该代码扩展embedding的73行判断边界条件: if max_pos > self.origin_shift: 没取等号导致的,取等号后可正常输出,其他数量也都ok Expected behavior A clear and concise description of what you expected to happen. 正常输出

System information

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

hankcs commented 1 month ago

感谢反馈,已经修复,请检查上面的commit是否解决了这个问题。 如果还有问题,欢迎重开issue。