hankcs / HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
https://hanlp.hankcs.com/en/
Apache License 2.0
33.51k stars 9.99k forks source link

英文的依存语法分析加载错误! #1900

Closed zsrainbow closed 1 month ago

zsrainbow commented 2 months ago

Describe the bug 在运行例子:https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_stl.ipynb,如果将语句换成: dep = hanlp.load(hanlp.pretrained.dep.PTB_BIAFFINE_DEP_EN) 则会出现如下的错误: Loading word2vec from cache ...Failed to load https://file.hankcs.com/hanlp/dep/ptb_dep_biaffine_20200101_174624.zip If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues When reporting an issue, make sure to paste the FULL ERROR LOG below. ================================ERROR LOG BEGINS================================ OS: Windows-10-10.0.22631-SP0 Python: 3.8.19 PyTorch: 2.1.2+cpu TensorFlow: 2.13.0 HanLP: 2.1.0-beta.58 Traceback (most recent call last): File "", line 1, in File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp__init__.py", line 43, in load return load_from_meta_file(save_dir, 'meta.json', verbose=verbose, kwargs) File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\utils\component_util.py", line 186, in load_from_meta_file raise e from None File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\utils\component_util.py", line 106, in load_from_meta_file obj.load(save_dir, verbose=verbose, kwargs) File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\common\keras_component.py", line 215, in load self.build(merge_dict(self.config, training=False, logger=logger, kwargs, overwrite=True, inplace=True)) File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\common\keras_component.py", line 225, in build self.model = self.build_model(**merge_dict(self.config, training=kwargs.get('training', None), File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\components\parsers\biaffine_parser_tf.py", line 42, in build_model pretrained: tf.keras.layers.Embedding = build_embedding(pretrained_embed, self.transform.form_vocab, File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\layers\embeddings\util_tf.py", line 44, in build_embedding layer: tf.keras.layers.Embedding = tf.keras.utils.deserialize_keras_object(embeddings) File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\keras\src\saving\serialization_lib.py", line 704, in deserialize_keras_object instance = cls.from_config(inner_config) File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\keras\src\engine\base_layer.py", line 870, in from_config raise TypeError( TypeError: Error when deserializing class 'Word2VecEmbeddingTF' using config={'trainable': False, 'embeddings_initializer': 'zero', 'filepath': 'https://nlp.stanford.edu/data/glove.6B.zip', 'expand_vocab': True, 'lowercase': False, 'unk': 'unk', 'normalize': True, 'name': 'glove.6B.100d', 'vocab': <hanlp.common.vocab_tf.VocabTF object at 0x0000025860082F10>}.

Exception encountered: C:\Users\Lenovo\AppData\Roaming\hanlp\thirdparty\nlp.stanford.edu\data/glove.6B =================================ERROR LOG ENDS=================================

Code to reproduce the issue import hanlp dep = hanlp.load(hanlp.pretrained.dep.PTB_BIAFFINE_DEP_EN)

Describe the current behavior 无法加载英文的依存语法分析模型

Expected behavior 正常加载英文的依存语法分析模型

System information

Other info / logs

注:初步分析,应该是预训练PTB_BIAFFINE_DEP_EN的序列化模型,在tensorflow中加载有误。

导致该错误出现的语句出现在: hanlp\layers\embeddings\util_tf.py文件的44行,如下所示 layer: tf.keras.layers.Embedding = tf.keras.utils.deserialize_keras_object(embeddings) 上面的反序列化,如果是自定义模型,命名应该遵循要求的格式。可以参考:https://www.tensorflow.org/api_docs/python/tf/keras/utils/deserialize_keras_object

hankcs commented 1 month ago
  1. https://github.com/hankcs/HanLP/blob/ca70784a1eab992ad72d5027380b1d0e34dd8afb/docs/install.md?plain=1#L132
  2. https://github.com/hankcs/HanLP/blob/dfd8d5eb7428f1097f68a2a70b555e65ec7b8f76/setup.py#L20