katanaml / sparrow

Data processing with ML, LLM and Vision LLM
https://katanaml.io
GNU General Public License v3.0
3.7k stars 379 forks source link

RuntimeError: Failed to import transformers.models.mpnet.modeling_mpnet #36

Closed TobiasJu closed 9 months ago

TobiasJu commented 9 months ago

Hi I tried runnig your demo with:

./sparrow.sh ingest

But this resulted in this error:

...
│ /home/tobias/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:698 in │
│ getattribute_from_module                                                                         │
│                                                                                                  │
│   695 │   │   return None                                                                        │
│   696 │   if isinstance(attr, tuple):                                                            │
│   697 │   │   return tuple(getattribute_from_module(module, a) for a in attr)                    │
│ ❱ 698 │   if hasattr(module, attr):                                                              │
│   699 │   │   return getattr(module, attr)                                                       │
│   700 │   # Some of the mappings have entries model_type -> object of another model type. In t   │
│   701 │   # object at the top level.                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │   attr = 'MPNetModel'                                                                        │ │
│ │ module = <module 'transformers.models.mpnet' from                                            │ │
│ │          '/home/tobias/.local/lib/python3.10/site-packages/transformers/models/mpnet/__init… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tobias/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1354 in      │
│ __getattr__                                                                                      │
│                                                                                                  │
│   1351 │   │   if name in self._modules:                                                         │
│   1352 │   │   │   value = self._get_module(name)                                                │
│   1353 │   │   elif name in self._class_to_module.keys():                                        │
│ ❱ 1354 │   │   │   module = self._get_module(self._class_to_module[name])                        │
│   1355 │   │   │   value = getattr(module, name)                                                 │
│   1356 │   │   else:                                                                             │
│   1357 │   │   │   raise AttributeError(f"module {self.__name__} has no attribute {name}")       │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ name = 'MPNetModel'                                                                          │ │
│ │ self = <module 'transformers.models.mpnet' from                                              │ │
│ │        '/home/tobias/.local/lib/python3.10/site-packages/transformers/models/mpnet/__init__… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/tobias/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1366 in      │
│ _get_module                                                                                      │
│                                                                                                  │
│   1363 │   │   try:                                                                              │
│   1364 │   │   │   return importlib.import_module("." + module_name, self.__name__)              │
│   1365 │   │   except Exception as e:                                                            │
│ ❱ 1366 │   │   │   raise RuntimeError(                                                           │
│   1367 │   │   │   │   f"Failed to import {self.__name__}.{module_name} because of the followin  │
│   1368 │   │   │   │   f" traceback):\n{e}"                                                      │
│   1369 │   │   │   ) from e                                                                      │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ module_name = 'modeling_mpnet'                                                               │ │
│ │        self = <module 'transformers.models.mpnet' from                                       │ │
│ │               '/home/tobias/.local/lib/python3.10/site-packages/transformers/models/mpnet/_… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Failed to import transformers.models.mpnet.modeling_mpnet because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
'FieldInfo' object has no attribute 'required'
/usr/lib/python3.10/tempfile.py:999: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpkf5oi4te'>
  _warnings.warn(warn_message, ResourceWarning)

Before I started docker, ran the installation and downloaded the model:

docker compose up -d
pip install -r requirements.txt
curl -fsSL https://ollama.com/install.sh | sh
wget https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/resolve/main/starling-lm-7b-alpha.Q5_K_M.gguf?download=true -O starling-lm-7b-alpha.Q5_K_M.gguf
TobiasJu commented 9 months ago

Some more info about my setup: Windows 10 WSL2 running Ubuntu

> python3 --version
Python 3.10.12
> pip list
Package                    Version       Editable project location
-------------------------- ------------- -------------------------
accelerate                 0.21.0
aiofiles                   23.2.1
aiohttp                    3.9.3
aiosignal                  1.3.1
altair                     5.2.0
annotated-types            0.6.0
antlr4-python3-runtime     4.9.3
anyio                      4.2.0
appdirs                    1.4.4
argcomplete                1.8.1
async-timeout              4.0.3
attrs                      23.2.0
Authlib                    1.3.0
Babel                      2.8.0
backoff                    2.2.1
beautifulsoup4             4.12.3
bitsandbytes               0.41.0
blinker                    1.4
boilerpy3                  1.0.7
build                      1.0.3
CacheControl               0.13.1
cachy                      0.3.0
certifi                    2023.11.17
cffi                       1.16.0
chardet                    4.0.0
charset-normalizer         3.3.2
cleo                       2.1.0
click                      8.1.7
clikit                     0.6.2
cmake                      3.28.1
colorama                   0.4.6
coloredlogs                15.0.1
command-not-found          0.3
contourpy                  1.2.0
crashtest                  0.4.1
cryptography               42.0.2
cycler                     0.12.1
dataclasses-json           0.6.4
dataclasses-json-speakeasy 0.5.11
dbus-python                1.2.18
deepspeed                  0.9.5
Deprecated                 1.2.14
distlib                    0.3.8
distro                     1.7.0
distro-info                1.1+ubuntu0.2
docker-pycreds             0.4.0
dulwich                    0.21.7
effdet                     0.4.1
einops                     0.6.1
einops-exts                0.0.4
emoji                      2.10.1
et-xmlfile                 1.1.0
exceptiongroup             1.2.0
fastapi                    0.108.0
fastjsonschema             2.19.1
ffmpy                      0.3.1
filelock                   3.13.1
filetype                   1.2.0
flash-attn                 2.5.2
flatbuffers                23.5.26
fonttools                  4.47.2
frozenlist                 1.4.1
fsspec                     2023.12.2
gitdb                      4.0.11
GitPython                  3.1.41
gradio                     4.16.0
gradio_client              0.8.1
greenlet                   3.0.3
grpcio                     1.60.1
grpcio-health-checking     1.60.1
grpcio-tools               1.60.1
h11                        0.14.0
haystack-ai                2.0.0b5
haystack-bm25              1.0.2
hjson                      3.1.0
html5lib                   1.1
httpcore                   1.0.2
httplib2                   0.20.2
httptools                  0.6.1
httpx                      0.26.0
huggingface-hub            0.20.3
humanfriendly              10.0
idna                       3.6
importlib-metadata         7.0.1
importlib-resources        6.1.1
installer                  0.7.0
iopath                     0.1.10
jaraco.classes             3.3.1
jeepney                    0.7.1
Jinja2                     3.1.3
joblib                     1.3.2
jsonpatch                  1.33
jsonpath-python            1.0.6
jsonpointer                2.4
jsonschema                 4.21.1
jsonschema-specifications  2023.12.1
keyring                    24.3.0
kiwisolver                 1.4.5
langchain                  0.1.1
langchain-community        0.0.19
langchain-core             0.1.22
langdetect                 1.0.9
langsmith                  0.0.87
launchpadlib               1.10.16
layoutparser               0.3.4
lazr.restfulclient         0.14.4
lazr.uri                   1.0.6
lazy-imports               0.3.1
linkify-it-py              2.0.2
lit                        17.0.6
livereload                 2.6.3
llama-index                0.9.33
llava                      1.2.2.post1   /mnt/a/KI/LLaVA
lockfile                   0.12.2
lxml                       5.1.0
Markdown                   3.3.6
markdown-it-py             2.2.0
markdown2                  2.4.12
MarkupSafe                 2.1.4
marshmallow                3.20.2
matplotlib                 3.8.2
mdit-py-plugins            0.3.3
mdurl                      0.1.2
mkdocs                     1.1.2
monotonic                  1.6
more-itertools             8.10.0
mpmath                     1.3.0
msg-parser                 1.2.0
msgpack                    1.0.3
multidict                  6.0.4
mypy-extensions            1.0.0
nest-asyncio               1.6.0
netifaces                  0.11.0
networkx                   3.2.1
ninja                      1.11.1.1
nltk                       3.8.1
numpy                      1.26.3
nvidia-cublas-cu11         11.10.3.66
nvidia-cublas-cu12         12.1.3.1
nvidia-cuda-cupti-cu11     11.7.101
nvidia-cuda-cupti-cu12     12.1.105
nvidia-cuda-nvrtc-cu11     11.7.99
nvidia-cuda-nvrtc-cu12     12.1.105
nvidia-cuda-runtime-cu11   11.7.99
nvidia-cuda-runtime-cu12   12.1.105
nvidia-cudnn-cu11          8.5.0.96
nvidia-cudnn-cu12          8.9.2.26
nvidia-cufft-cu11          10.9.0.58
nvidia-cufft-cu12          11.0.2.54
nvidia-curand-cu11         10.2.10.91
nvidia-curand-cu12         10.3.2.106
nvidia-cusolver-cu11       11.4.0.1
nvidia-cusolver-cu12       11.4.5.107
nvidia-cusparse-cu11       11.7.4.91
nvidia-cusparse-cu12       12.1.0.106
nvidia-nccl-cu11           2.14.3
nvidia-nccl-cu12           2.18.1
nvidia-nvjitlink-cu12      12.3.101
nvidia-nvtx-cu11           11.7.91
nvidia-nvtx-cu12           12.1.105
oauthlib                   3.2.0
olefile                    0.47
ollama-haystack            0.0.3
omegaconf                  2.3.0
onnx                       1.15.0
onnxruntime                1.15.1
openai                     1.12.0
opencv-python              4.9.0.80
openpyxl                   3.1.2
orjson                     3.9.12
packaging                  23.2
pandas                     2.2.0
pastel                     0.2.1
pdf2image                  1.17.0
pdfminer.six               20221105
pdfplumber                 0.10.4
peft                       0.4.0
pexpect                    4.8.0
pikepdf                    8.12.0
pillow                     10.2.0
pillow_heif                0.15.0
pip                        24.0
pipx                       1.0.0
pkginfo                    1.9.6
platformdirs               3.11.0
poetry-core                1.8.1
poetry-plugin-export       1.6.0
portalocker                2.8.2
posthog                    3.4.0
protobuf                   4.25.2
psutil                     5.9.8
ptyprocess                 0.7.0
py-cpuinfo                 9.0.0
pycocotools                2.0.7
pycparser                  2.21
pydantic                   2.6.1
pydantic_core              2.16.2
pydub                      0.25.1
Pygments                   2.17.2
PyGObject                  3.42.1
pyinotify                  0.9.6
PyJWT                      2.3.0
pylev                      1.2.0
pynvml                     11.5.0
pypandoc                   1.12
pyparsing                  2.4.7
pypdf                      4.0.1
pypdfium2                  4.27.0
pyproject_hooks            1.0.0
pytesseract                0.3.10
python-apt                 2.4.0+ubuntu2
python-box                 7.1.1
python-dateutil            2.8.2
python-docx                1.1.0
python-dotenv              1.0.1
python-iso639              2024.2.7
python-magic               0.4.27
python-multipart           0.0.6
python-pptx                0.6.23
pytz                       2023.4
PyYAML                     5.4.1
rapidfuzz                  3.6.1
referencing                0.33.0
regex                      2023.12.25
requests                   2.31.0
requests-toolbelt          0.9.1
rich                       13.7.0
rpds-py                    0.17.1
ruff                       0.2.1
safetensors                0.4.2
scikit-learn               1.2.2
scipy                      1.12.0
SecretStorage              3.3.1
semantic-version           2.10.0
sentence-transformers      2.3.1
sentencepiece              0.1.99
sentry-sdk                 1.40.2
setproctitle               1.3.3
setuptools                 59.6.0
shellingham                1.5.4
shortuuid                  1.0.11
six                        1.16.0
smmap                      5.0.1
sniffio                    1.3.0
soupsieve                  2.5
SQLAlchemy                 2.0.26
starlette                  0.32.0.post1
svgwrite                   1.4.3
sympy                      1.12
systemd-python             234
tabulate                   0.9.0
tenacity                   8.2.3
threadpoolctl              3.2.0
tiktoken                   0.6.0
timm                       0.9.12
tokenizers                 0.15.1
tomli                      2.0.1
tomlkit                    0.12.0
toolz                      0.12.1
torch                      2.1.2
torchvision                0.16.2
tornado                    6.1
tqdm                       4.66.1
transformers               4.37.2
triton                     2.1.0
trove-classifiers          2024.1.31
typer                      0.9.0
typing_extensions          4.9.0
typing-inspect             0.9.0
tzdata                     2023.4
ubuntu-advantage-tools     8001
uc-micro-py                1.0.2
ufw                        0.36.1
unattended-upgrades        0.1
unstructured               0.12.4
unstructured-client        0.18.0
unstructured-inference     0.7.23
unstructured.pytesseract   0.3.12
urllib3                    2.2.0
userpath                   1.8.0
uvicorn                    0.27.0.post1
uvloop                     0.19.0
validators                 0.22.0
virtualenv                 20.25.0
wadllib                    1.3.6
wandb                      0.16.3
watchfiles                 0.21.0
wavedrom                   2.0.3.post3
weaviate-client            4.4.4
webencodings               0.5.1
websockets                 11.0.3
wheel                      0.37.1
wrapt                      1.16.0
xlrd                       2.0.1
XlsxWriter                 3.1.9
yarl                       1.9.4
zipp                       1.0.0
TobiasJu commented 9 months ago

This is the part of the code where the error originates:

        embeddings = LangchainEmbedding(
            HuggingFaceEmbeddings(model_name=model_name)
        )

I tried using the llama_index HuggingFaceEmbedding implementation, but this fails the same way. ❌

        from llama_index.embeddings import HuggingFaceEmbedding
        embeddings = HuggingFaceEmbedding(
            model_name=model_name
        )

Only the MockEmbedding works. ✔

        from llama_index import MockEmbedding
        embeddings = MockEmbedding(384)
abaranovskis-redsamurai commented 9 months ago

Hi,

It seems like HuggingFace library issue on Ubuntu. I'm running it on MacOS and didnt experience this issue, I will need to test on Ubuntu. I see you are running it inside Docker. What if you try to install it directly on your host? It could be Docker related issue, I need to try and test install and run process within Docker container.

I see you are downloading LLM model from HuggingFace, I think you should download from Ollama. But this is unrelated to ingest step.

My environment:

python3 --version Python 3.10.4 pip list aiohttp 3.9.3 aiosignal 1.3.1 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.2.0 asgiref 3.7.2 async-timeout 4.0.3 attrs 23.2.0 Authlib 1.3.0 backoff 2.2.1 bcrypt 4.1.2 beautifulsoup4 4.12.3 boilerpy3 1.0.7 cachetools 5.3.2 certifi 2024.2.2 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 chroma-hnswlib 0.7.3 chromadb 0.4.19 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.2.0 cryptography 42.0.2 cycler 0.12.1 dataclasses-json 0.6.4 dataclasses-json-speakeasy 0.5.11 Deprecated 1.2.14 distro 1.9.0 effdet 0.4.1 emoji 2.10.1 et-xmlfile 1.1.0 exceptiongroup 1.2.0 fastapi 0.108.0 filelock 3.13.1 filetype 1.2.0 flatbuffers 23.5.26 fonttools 4.47.2 frozenlist 1.4.1 fsspec 2023.12.2 google-auth 2.27.0 googleapis-common-protos 1.62.0 greenlet 3.0.3 grpcio 1.60.1 grpcio-health-checking 1.60.1 grpcio-tools 1.60.1 h11 0.14.0 haystack-ai 2.0.0b5 haystack-bm25 1.0.2 httpcore 1.0.2 httptools 0.6.1 httpx 0.26.0 huggingface-hub 0.20.3 humanfriendly 10.0 idna 3.6 importlib-metadata 6.11.0 importlib-resources 6.1.1 iopath 0.1.10 Jinja2 3.1.3 joblib 1.3.2 jsonpatch 1.33 jsonpath-python 1.0.6 jsonpointer 2.4 kiwisolver 1.4.5 kubernetes 29.0.0 langchain 0.1.1 langchain-community 0.0.17 langchain-core 0.1.18 langdetect 1.0.9 langsmith 0.0.86 layoutparser 0.3.4 lazy-imports 0.3.1 llama-index 0.9.33 lxml 5.1.0 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.20.2 matplotlib 3.8.2 mdurl 0.1.2 mmh3 4.1.0 monotonic 1.6 more-itertools 10.2.0 mpmath 1.3.0 msg-parser 1.2.0 multidict 6.0.5 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.2.1 nltk 3.8.1 numpy 1.26.3 oauthlib 3.2.2 olefile 0.47 ollama-haystack 0.0.3 omegaconf 2.3.0 onnx 1.15.0 onnxruntime 1.15.1 openai 1.10.0 opencv-python 4.9.0.80 openpyxl 3.1.2 opentelemetry-api 1.22.0 opentelemetry-exporter-otlp-proto-common 1.22.0 opentelemetry-exporter-otlp-proto-grpc 1.22.0 opentelemetry-instrumentation 0.43b0 opentelemetry-instrumentation-asgi 0.43b0 opentelemetry-instrumentation-fastapi 0.43b0 opentelemetry-proto 1.22.0 opentelemetry-sdk 1.22.0 opentelemetry-semantic-conventions 0.43b0 opentelemetry-util-http 0.43b0 overrides 7.7.0 packaging 23.2 pandas 2.2.0 pdf2image 1.17.0 pdfminer.six 20221105 pdfplumber 0.10.3 pikepdf 8.12.0 pillow 10.2.0 pip 23.2.1 portalocker 2.8.2 posthog 3.3.4 protobuf 4.25.2 pulsar-client 3.4.0 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycocotools 2.0.7 pycparser 2.21 pydantic 2.6.0 pydantic_core 2.16.1 Pygments 2.17.2 pypandoc 1.12 pyparsing 3.1.1 pypdf 4.0.1 pypdfium2 4.26.0 PyPika 0.48.9 pytesseract 0.3.10 python-box 7.1.1 python-dateutil 2.8.2 python-docx 1.1.0 python-dotenv 1.0.1 python-iso639 2024.1.2 python-magic 0.4.27 python-multipart 0.0.6 python-pptx 0.6.23 pytz 2024.1 PyYAML 6.0.1 rapidfuzz 3.6.1 regex 2023.12.25 requests 2.31.0 requests-oauthlib 1.3.1 rich 13.7.0 rsa 4.9 safetensors 0.4.2 scikit-learn 1.4.0 scipy 1.12.0 sentence-transformers 2.3.1 sentencepiece 0.1.99 setuptools 68.2.0 shellingham 1.5.4 six 1.16.0 sniffio 1.3.0 soupsieve 2.5 SQLAlchemy 2.0.25 starlette 0.32.0.post1 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 threadpoolctl 3.2.0 tiktoken 0.5.2 timm 0.9.12 tokenizers 0.15.1 torch 2.2.0 torchvision 0.17.0 tqdm 4.66.1 transformers 4.37.2 typer 0.9.0 typing_extensions 4.9.0 typing-inspect 0.9.0 tzdata 2023.4 unstructured 0.12.3 unstructured-client 0.17.0 unstructured-inference 0.7.23 unstructured.pytesseract 0.3.12 urllib3 2.2.0 uvicorn 0.27.0.post1 uvloop 0.19.0 validators 0.22.0 watchfiles 0.21.0 weaviate-client 4.4.1 websocket-client 1.7.0 websockets 12.0 wheel 0.41.2 wrapt 1.16.0 xlrd 2.0.1 XlsxWriter 3.1.9 yarl 1.9.4 zipp 3.17.0