nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.
https://parser.kitaev.io/
MIT License
870 stars 153 forks source link

Full dependencies list for spacy? (a problem might caused by pydantic) #108

Open LuoXiaoxi-cxq opened 1 month ago

LuoXiaoxi-cxq commented 1 month ago

I wanted to train a Chinese model and ran the following command from EXPERIMENTS.md:

python src/main.py train \
    --train-path "data/ctb_5.1/ctb.train" \
    --dev-path "data/ctb_5.1/ctb.dev" \
    --text-processing "chinese" \
    --use-pretrained --pretrained-model "bert-base-chinese" \
    --predict-tags \
    --model-path-base models/Chinese_bert_base_chinese

However, I encountered this error:

Traceback (most recent call last):
  File "D:\postgraduate\research\parsing\self-attentive-parser\src\main.py", line 11, in <module>
    from benepar import char_lstm
  File "D:\postgraduate\research\parsing\self-attentive-parser\src\benepar\__init__.py", line 20, in <module>
    from .integrations.spacy_plugin import BeneparComponent, NonConstituentException
  File "D:\postgraduate\research\parsing\self-attentive-parser\src\benepar\integrations\spacy_plugin.py", line 5, in <module>
    from .spacy_extensions import ConstituentData, NonConstituentException
  File "D:\postgraduate\research\parsing\self-attentive-parser\src\benepar\integrations\spacy_extensions.py", line 177, in <module>
    install_spacy_extensions()
  File "D:\postgraduate\research\parsing\self-attentive-parser\src\benepar\integrations\spacy_extensions.py", line 153, in install_spacy_extensions
    from spacy.tokens import Doc, Span, Token
  File "D:\anaconda\lib\site-packages\spacy\__init__.py", line 14, in <module>
    from . import pipeline  # noqa: F401
  File "D:\anaconda\lib\site-packages\spacy\pipeline\__init__.py", line 1, in <module>
    from .attributeruler import AttributeRuler
  File "D:\anaconda\lib\site-packages\spacy\pipeline\attributeruler.py", line 6, in <module>
    from .pipe import Pipe
  File "spacy\pipeline\pipe.pyx", line 8, in init spacy.pipeline.pipe
  File "D:\anaconda\lib\site-packages\spacy\training\__init__.py", line 11, in <module>
    from .callbacks import create_copy_from_base_model  # noqa: F401
  File "D:\anaconda\lib\site-packages\spacy\training\callbacks.py", line 3, in <module>
    from ..language import Language
  File "D:\anaconda\lib\site-packages\spacy\language.py", line 25, in <module>
    from .training.initialize import init_vocab, init_tok2vec
  File "D:\anaconda\lib\site-packages\spacy\training\initialize.py", line 14, in <module>
    from .pretrain import get_tok2vec_ref
  File "D:\anaconda\lib\site-packages\spacy\training\pretrain.py", line 16, in <module>
    from ..schemas import ConfigSchemaPretrain
  File "D:\anaconda\lib\site-packages\spacy\schemas.py", line 216, in <module>
    class TokenPattern(BaseModel):
  File "pydantic\main.py", line 299, in pydantic.main.ModelMetaclass.__new__
    print("Loaded {:,} test examples.".format(len(test_treebank)))
  File "pydantic\fields.py", line 411, in pydantic.fields.ModelField.infer
  File "pydantic\fields.py", line 342, in pydantic.fields.ModelField.__init__
  File "pydantic\fields.py", line 451, in pydantic.fields.ModelField.prepare
  File "pydantic\fields.py", line 545, in pydantic.fields.ModelField._type_analysis
  File "pydantic\fields.py", line 550, in pydantic.fields.ModelField._type_analysis
  File "D:\anaconda\lib\typing.py", line 852, in __subclasscheck__
    return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class

This issue says installing two packages chromadb and pydantic will work, so I installed them. I ran

python -m pip install -U pydantic spacy
python -m pip install -U chromadb spacy

Now, I have

pydantic == 2.9.2                   
pydantic-core ==  2.23.4 
scapy == 3.7.6
typing-extensions == 4.12.2
chromadb == 0.5.9 

However, the problem still exists. According to this issue, this problem should only exist for pydantic v1.10.7 and earlier, related to the recent release of typing_extensions v4.6.0. I installed higher versions, but it didn't solve this error.

nikitakit commented 1 month ago

If all you want to do is train a model, I recommend just disabling the spacy integration by commenting out the line

  File "D:\postgraduate\research\parsing\self-attentive-parser\src\benepar\__init__.py", line 20, in <module>
    from .integrations.spacy_plugin import BeneparComponent, NonConstituentException

Spacy integration is for inference time only.

No idea what's going on with this error. It sounds like it's inside some libraries, and didn't exist with the versions of those libraries I used way back when releasing benepar. It's disappointing if the underlying cause is that the libraries have introduced bugs or broken backwards compatibility. Here are the versions I have for one of my archived working setups.

pydantic           1.7.3
spacy              3.0.1
spacy-legacy       3.0.1
typing-extensions  3.7.4.3
LuoXiaoxi-cxq commented 1 month ago

Thank you for your response. I once tried to downgrade spacy to 3.5.0, and I encountered the following error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-md 3.7.1 requires spacy<3.8.0,>=3.7.2, but you have spacy 3.5.0 which is incompatible.

However, In your README, you says

The recommended way of using benepar is through integration with spaCy. If using spaCy, you should install a spaCy model for your language. For English, the installation command is:

$ python -m spacy download en_core_web_md

It seems that en_core_web_md requires a higher version of spacy. Is that right?