segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
624 stars 36 forks source link

Error when installing the requirements #122

Closed RacheleSprugnoli closed 1 week ago

RacheleSprugnoli commented 1 week ago

Hello! When installing the requirements (pip install -r requirements.txt) I have the following error (the first of a long list indeed...):

Downloading spacy-3.0.6.tar.gz (7.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 10.7 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
Error compiling Cython file:
------------------------------------------------------------
...
int length
cdef class Vocab:
cdef Pool mem
cpdef readonly StringStore strings
------------------------------------------------------------
spacy/vocab.pxd:28:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Any suggestions on how to solve it? Thanks in advance, Rachele

bminixhofer commented 1 week ago

FYI you'll only need to install the packages in requirements.txt if you're trying to reproduce the baselines from the paper, otherwise the requirements in setup.py are enough: https://github.com/segment-any-text/wtpsplit/blob/cfd5e24411d8c658b3000af0c62b6602a6c955ca/setup.py#L10-L21

In case that is what you are trying to do, this seems like an issue with SpaCy, you could try e.g. the solution here or upgrading SpaCy.

markus583 commented 1 week ago

I double-checked installing wtpsplit and then requirements.txt (which, as Benjamin noted, are only necessary when reproducing baselines/adaptation to target domains). There was an issue with one package that is not really needed. It is removed in 2.0.5; please upgrade. I tried this in a fresh conda environment: pip install wtpsplit, then pip install -r requirements.txt. I got no pip errors (You can ignore the one related to adapters).