hankcs / HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
https://hanlp.hankcs.com/
Apache License 2.0
33.97k stars 10.18k forks source link

phraseTree引发的import error #1886

Closed oasis-0927 closed 8 months ago

oasis-0927 commented 8 months ago

Describe the bug python3.9+中将cgi.escape 移除,修改为html.escape ,新版本的nltk库中已经进行修改,但是由于本项目引用的是没有进行相关修改的phraseTree,因此在python 3.9+的环境中使用pretty_print方法会报错。

是否可以尝试将phraseTree都统一替换为nltk.tree 来解决此问题。

Code to reproduce the issue

import hanlp
from hanlp_common.document import Document

def merge_pos_into_con(doc: Document):
    flat = isinstancse(doc['pos'][0], str)
    if flat:
        doc = Document((k, [v]) for k, v in doc.items())
    for tree, tags in zip(doc['con'], doc['pos']):
        offset = 0
        for subtree in tree.subtrees(lambda t: t.height() == 2):
            tag = subtree.label()
            if tag == '_':
                subtree.set_label(tags[offset])
            offset += 1
    if flat:
        doc = doc.squeeze()
    return doc

con = hanlp.load('CTB9_CON_FULL_TAG_ELECTRA_SMALL')
tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
nlp = hanlp.pipeline().append(pos, input_key='tok', output_key='pos') \
    .append(con, input_key='tok', output_key='con')
doc = nlp(tok=["2021年", "HanLPv2.1", "带来", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])['con']
doc.pretty_print()

Describe the current behavior A clear and concise description of what happened.

Expected behavior A clear and concise description of what you expected to happen.

System information

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. 315947501-49fdf6aa-4e0c-4892-aff0-692cf2a61a4a

hankcs commented 8 months ago

感谢反馈,已经修复,请检查上面的commit是否解决了这个问题。 如果还有问题,欢迎重开issue。

phrasetree有序列化的功能,而且更轻量化。

oasis-0927 commented 8 months ago

测试已修复,感谢。