segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
753 stars 44 forks source link

ImportError with language and style_or_domain arguments and huggingface-hub 0.26 #135

Closed carschno closed 4 weeks ago

carschno commented 1 month ago

I get an ImportError when I use the language and style_or_domain parameters upon initializing a SaT model when using the most recent version huggingface-hub version (0.26.x), which is installed by default:

% pip install -U wtpsplit
[...]
Successfully installed wtpsplit-2.1.0
>>> import wtpsplit
>>> wtpsplit.SaT("sat-3l-sm", language="en", style_or_domain="-")
[...]
ImportError: cannot import name 'url_to_filename' from 'huggingface_hub.file_download' (.../.venv/lib/python3.11/site-packages/huggingface_hub/file_download.py)

The initialization does work without the additional arguments:

>>> wtpsplit.SaT("sat-3l-sm")
<wtpsplit.SaT at 0x103111b90>

The error does not appear with older huggingface-hub versions (<= 0.25):

% pip install huggingface-hub==0.25
[...]
Successfully installed huggingface-hub-0.25.0
>>> import wtpsplit
>>> wtpsplit.SaT("sat-3l-sm", language="en", style_or_domain="-")
LoRA -/en not found, using base model...
<wtpsplit.SaT at 0x174abc450>

This is on Mac Os with Python 3.11, but seems to occur across OS's and Python versions.

% python --version
Python 3.11.2
carschno commented 1 month ago

See e.g. this run for failing build.

The fix in #136 makes the build pass again.

markus583 commented 4 weeks ago

Thanks for catching this! Indeed, I can reproduce this issue. This comes from the adapters library which we use for loading LoRA modules, and there also exists an issue for their library: #750

It could take some time until the fix will end up in their pypi version, so I will go ahead and merge your PR. Thanks for contributing! Once their fix ended up in pypi, we should remove fixing the huggingface-hub version again.