hankcs / HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
https://hanlp.hankcs.com/
Apache License 2.0
33.97k stars 10.18k forks source link

hanlp.load(SIGHAN2005_MSR_CONVSEG) 卡住了 #1842

Closed wencan closed 1 year ago

wencan commented 1 year ago

Describe the bug 第一次import hanlp hanlp.load(SIGHAN2005_MSR_CONVSEG) 卡住了

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

import hanlp
from hanlp.utils.rules import split_sentence
from hanlp.pretrained.tok import SIGHAN2005_MSR_CONVSEG

tok = hanlp.load(SIGHAN2005_MSR_CONVSEG)

输出

2023-09-08 17:42:46.670747: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-08 17:42:46.716386: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-08 17:42:46.716815: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-08 17:42:47.808801: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

看之前的输出,下载已经完成

输入ctrl-d,会输出:

terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device could not be read

试了另一个tok.SIGHAN2005_PKU_CONVSEG,也是这样 COARSE_ELECTRA_SMALL_ZH没问题

Describe the current behavior A clear and concise description of what happened.

Expected behavior A clear and concise description of what you expected to happen.

System information os: Fedora Linux 38 (Workstation Edition) kernel: 6.4.14 python: 3.11.4 hanlp: 2.1.0b50

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

hankcs commented 1 year ago

无法复现:https://colab.research.google.com/drive/1ghZToCWKzStRZV_a6npuzLHALIYVTBB6#scrollTo=rkVVUh9jeX5S&line=2&uniqifier=1

可能是tf不支持你的硬件,建议向对方社区反馈。

wencan commented 1 year ago

已经确认,不是hanlp的bug

应该是这个bug: https://github.com/pytorch/pytorch/issues/102360