hankcs / HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
https://hanlp.hankcs.com/en/
Apache License 2.0
33.51k stars 9.99k forks source link

执行open_small.py时报'utf-8' codec can't decode byte 0xb4 in position 0: invalid start byte #1863

Closed hiking-coder closed 9 months ago

hiking-coder commented 9 months ago

作者大大,本人小白在尝试使用hanlp2.1执行train的demo时,一直卡在导入from hanlp.datasets.ner.msra import 导入数据这关,然后也搜索了您论坛上的解决方案,如下设置 image 但还是一直执行后报如下错误 image

请问是什么原因呢,先感谢大佬了

hankcs commented 9 months ago

Hi, please refer to the doc: https://github.com/hankcs/HanLP/blob/2ab077001c7a08d7a9ac51e77d9b8603fdb16fa1/docs/api/hanlp/datasets/index.md#L12