Closed TobiasJu closed 9 months ago
Some more info about my setup: Windows 10 WSL2 running Ubuntu
> python3 --version
Python 3.10.12
> pip list
Package Version Editable project location
-------------------------- ------------- -------------------------
accelerate 0.21.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
altair 5.2.0
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.2.0
appdirs 1.4.4
argcomplete 1.8.1
async-timeout 4.0.3
attrs 23.2.0
Authlib 1.3.0
Babel 2.8.0
backoff 2.2.1
beautifulsoup4 4.12.3
bitsandbytes 0.41.0
blinker 1.4
boilerpy3 1.0.7
build 1.0.3
CacheControl 0.13.1
cachy 0.3.0
certifi 2023.11.17
cffi 1.16.0
chardet 4.0.0
charset-normalizer 3.3.2
cleo 2.1.0
click 8.1.7
clikit 0.6.2
cmake 3.28.1
colorama 0.4.6
coloredlogs 15.0.1
command-not-found 0.3
contourpy 1.2.0
crashtest 0.4.1
cryptography 42.0.2
cycler 0.12.1
dataclasses-json 0.6.4
dataclasses-json-speakeasy 0.5.11
dbus-python 1.2.18
deepspeed 0.9.5
Deprecated 1.2.14
distlib 0.3.8
distro 1.7.0
distro-info 1.1+ubuntu0.2
docker-pycreds 0.4.0
dulwich 0.21.7
effdet 0.4.1
einops 0.6.1
einops-exts 0.0.4
emoji 2.10.1
et-xmlfile 1.1.0
exceptiongroup 1.2.0
fastapi 0.108.0
fastjsonschema 2.19.1
ffmpy 0.3.1
filelock 3.13.1
filetype 1.2.0
flash-attn 2.5.2
flatbuffers 23.5.26
fonttools 4.47.2
frozenlist 1.4.1
fsspec 2023.12.2
gitdb 4.0.11
GitPython 3.1.41
gradio 4.16.0
gradio_client 0.8.1
greenlet 3.0.3
grpcio 1.60.1
grpcio-health-checking 1.60.1
grpcio-tools 1.60.1
h11 0.14.0
haystack-ai 2.0.0b5
haystack-bm25 1.0.2
hjson 3.1.0
html5lib 1.1
httpcore 1.0.2
httplib2 0.20.2
httptools 0.6.1
httpx 0.26.0
huggingface-hub 0.20.3
humanfriendly 10.0
idna 3.6
importlib-metadata 7.0.1
importlib-resources 6.1.1
installer 0.7.0
iopath 0.1.10
jaraco.classes 3.3.1
jeepney 0.7.1
Jinja2 3.1.3
joblib 1.3.2
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
keyring 24.3.0
kiwisolver 1.4.5
langchain 0.1.1
langchain-community 0.0.19
langchain-core 0.1.22
langdetect 1.0.9
langsmith 0.0.87
launchpadlib 1.10.16
layoutparser 0.3.4
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
lazy-imports 0.3.1
linkify-it-py 2.0.2
lit 17.0.6
livereload 2.6.3
llama-index 0.9.33
llava 1.2.2.post1 /mnt/a/KI/LLaVA
lockfile 0.12.2
lxml 5.1.0
Markdown 3.3.6
markdown-it-py 2.2.0
markdown2 2.4.12
MarkupSafe 2.1.4
marshmallow 3.20.2
matplotlib 3.8.2
mdit-py-plugins 0.3.3
mdurl 0.1.2
mkdocs 1.1.2
monotonic 1.6
more-itertools 8.10.0
mpmath 1.3.0
msg-parser 1.2.0
msgpack 1.0.3
multidict 6.0.4
mypy-extensions 1.0.0
nest-asyncio 1.6.0
netifaces 0.11.0
networkx 3.2.1
ninja 1.11.1.1
nltk 3.8.1
numpy 1.26.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 8.5.0.96
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.2.10.91
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.4.91
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu11 2.14.3
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu11 11.7.91
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.0
olefile 0.47
ollama-haystack 0.0.3
omegaconf 2.3.0
onnx 1.15.0
onnxruntime 1.15.1
openai 1.12.0
opencv-python 4.9.0.80
openpyxl 3.1.2
orjson 3.9.12
packaging 23.2
pandas 2.2.0
pastel 0.2.1
pdf2image 1.17.0
pdfminer.six 20221105
pdfplumber 0.10.4
peft 0.4.0
pexpect 4.8.0
pikepdf 8.12.0
pillow 10.2.0
pillow_heif 0.15.0
pip 24.0
pipx 1.0.0
pkginfo 1.9.6
platformdirs 3.11.0
poetry-core 1.8.1
poetry-plugin-export 1.6.0
portalocker 2.8.2
posthog 3.4.0
protobuf 4.25.2
psutil 5.9.8
ptyprocess 0.7.0
py-cpuinfo 9.0.0
pycocotools 2.0.7
pycparser 2.21
pydantic 2.6.1
pydantic_core 2.16.2
pydub 0.25.1
Pygments 2.17.2
PyGObject 3.42.1
pyinotify 0.9.6
PyJWT 2.3.0
pylev 1.2.0
pynvml 11.5.0
pypandoc 1.12
pyparsing 2.4.7
pypdf 4.0.1
pypdfium2 4.27.0
pyproject_hooks 1.0.0
pytesseract 0.3.10
python-apt 2.4.0+ubuntu2
python-box 7.1.1
python-dateutil 2.8.2
python-docx 1.1.0
python-dotenv 1.0.1
python-iso639 2024.2.7
python-magic 0.4.27
python-multipart 0.0.6
python-pptx 0.6.23
pytz 2023.4
PyYAML 5.4.1
rapidfuzz 3.6.1
referencing 0.33.0
regex 2023.12.25
requests 2.31.0
requests-toolbelt 0.9.1
rich 13.7.0
rpds-py 0.17.1
ruff 0.2.1
safetensors 0.4.2
scikit-learn 1.2.2
scipy 1.12.0
SecretStorage 3.3.1
semantic-version 2.10.0
sentence-transformers 2.3.1
sentencepiece 0.1.99
sentry-sdk 1.40.2
setproctitle 1.3.3
setuptools 59.6.0
shellingham 1.5.4
shortuuid 1.0.11
six 1.16.0
smmap 5.0.1
sniffio 1.3.0
soupsieve 2.5
SQLAlchemy 2.0.26
starlette 0.32.0.post1
svgwrite 1.4.3
sympy 1.12
systemd-python 234
tabulate 0.9.0
tenacity 8.2.3
threadpoolctl 3.2.0
tiktoken 0.6.0
timm 0.9.12
tokenizers 0.15.1
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.1.2
torchvision 0.16.2
tornado 6.1
tqdm 4.66.1
transformers 4.37.2
triton 2.1.0
trove-classifiers 2024.1.31
typer 0.9.0
typing_extensions 4.9.0
typing-inspect 0.9.0
tzdata 2023.4
ubuntu-advantage-tools 8001
uc-micro-py 1.0.2
ufw 0.36.1
unattended-upgrades 0.1
unstructured 0.12.4
unstructured-client 0.18.0
unstructured-inference 0.7.23
unstructured.pytesseract 0.3.12
urllib3 2.2.0
userpath 1.8.0
uvicorn 0.27.0.post1
uvloop 0.19.0
validators 0.22.0
virtualenv 20.25.0
wadllib 1.3.6
wandb 0.16.3
watchfiles 0.21.0
wavedrom 2.0.3.post3
weaviate-client 4.4.4
webencodings 0.5.1
websockets 11.0.3
wheel 0.37.1
wrapt 1.16.0
xlrd 2.0.1
XlsxWriter 3.1.9
yarl 1.9.4
zipp 1.0.0
This is the part of the code where the error originates:
embeddings = LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name)
)
I tried using the llama_index HuggingFaceEmbedding implementation, but this fails the same way. ❌
from llama_index.embeddings import HuggingFaceEmbedding
embeddings = HuggingFaceEmbedding(
model_name=model_name
)
Only the MockEmbedding works. ✔
from llama_index import MockEmbedding
embeddings = MockEmbedding(384)
Hi,
It seems like HuggingFace library issue on Ubuntu. I'm running it on MacOS and didnt experience this issue, I will need to test on Ubuntu. I see you are running it inside Docker. What if you try to install it directly on your host? It could be Docker related issue, I need to try and test install and run process within Docker container.
I see you are downloading LLM model from HuggingFace, I think you should download from Ollama. But this is unrelated to ingest step.
My environment:
python3 --version Python 3.10.4 pip list aiohttp 3.9.3 aiosignal 1.3.1 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.2.0 asgiref 3.7.2 async-timeout 4.0.3 attrs 23.2.0 Authlib 1.3.0 backoff 2.2.1 bcrypt 4.1.2 beautifulsoup4 4.12.3 boilerpy3 1.0.7 cachetools 5.3.2 certifi 2024.2.2 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 chroma-hnswlib 0.7.3 chromadb 0.4.19 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.2.0 cryptography 42.0.2 cycler 0.12.1 dataclasses-json 0.6.4 dataclasses-json-speakeasy 0.5.11 Deprecated 1.2.14 distro 1.9.0 effdet 0.4.1 emoji 2.10.1 et-xmlfile 1.1.0 exceptiongroup 1.2.0 fastapi 0.108.0 filelock 3.13.1 filetype 1.2.0 flatbuffers 23.5.26 fonttools 4.47.2 frozenlist 1.4.1 fsspec 2023.12.2 google-auth 2.27.0 googleapis-common-protos 1.62.0 greenlet 3.0.3 grpcio 1.60.1 grpcio-health-checking 1.60.1 grpcio-tools 1.60.1 h11 0.14.0 haystack-ai 2.0.0b5 haystack-bm25 1.0.2 httpcore 1.0.2 httptools 0.6.1 httpx 0.26.0 huggingface-hub 0.20.3 humanfriendly 10.0 idna 3.6 importlib-metadata 6.11.0 importlib-resources 6.1.1 iopath 0.1.10 Jinja2 3.1.3 joblib 1.3.2 jsonpatch 1.33 jsonpath-python 1.0.6 jsonpointer 2.4 kiwisolver 1.4.5 kubernetes 29.0.0 langchain 0.1.1 langchain-community 0.0.17 langchain-core 0.1.18 langdetect 1.0.9 langsmith 0.0.86 layoutparser 0.3.4 lazy-imports 0.3.1 llama-index 0.9.33 lxml 5.1.0 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 marshmallow 3.20.2 matplotlib 3.8.2 mdurl 0.1.2 mmh3 4.1.0 monotonic 1.6 more-itertools 10.2.0 mpmath 1.3.0 msg-parser 1.2.0 multidict 6.0.5 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.2.1 nltk 3.8.1 numpy 1.26.3 oauthlib 3.2.2 olefile 0.47 ollama-haystack 0.0.3 omegaconf 2.3.0 onnx 1.15.0 onnxruntime 1.15.1 openai 1.10.0 opencv-python 4.9.0.80 openpyxl 3.1.2 opentelemetry-api 1.22.0 opentelemetry-exporter-otlp-proto-common 1.22.0 opentelemetry-exporter-otlp-proto-grpc 1.22.0 opentelemetry-instrumentation 0.43b0 opentelemetry-instrumentation-asgi 0.43b0 opentelemetry-instrumentation-fastapi 0.43b0 opentelemetry-proto 1.22.0 opentelemetry-sdk 1.22.0 opentelemetry-semantic-conventions 0.43b0 opentelemetry-util-http 0.43b0 overrides 7.7.0 packaging 23.2 pandas 2.2.0 pdf2image 1.17.0 pdfminer.six 20221105 pdfplumber 0.10.3 pikepdf 8.12.0 pillow 10.2.0 pip 23.2.1 portalocker 2.8.2 posthog 3.3.4 protobuf 4.25.2 pulsar-client 3.4.0 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycocotools 2.0.7 pycparser 2.21 pydantic 2.6.0 pydantic_core 2.16.1 Pygments 2.17.2 pypandoc 1.12 pyparsing 3.1.1 pypdf 4.0.1 pypdfium2 4.26.0 PyPika 0.48.9 pytesseract 0.3.10 python-box 7.1.1 python-dateutil 2.8.2 python-docx 1.1.0 python-dotenv 1.0.1 python-iso639 2024.1.2 python-magic 0.4.27 python-multipart 0.0.6 python-pptx 0.6.23 pytz 2024.1 PyYAML 6.0.1 rapidfuzz 3.6.1 regex 2023.12.25 requests 2.31.0 requests-oauthlib 1.3.1 rich 13.7.0 rsa 4.9 safetensors 0.4.2 scikit-learn 1.4.0 scipy 1.12.0 sentence-transformers 2.3.1 sentencepiece 0.1.99 setuptools 68.2.0 shellingham 1.5.4 six 1.16.0 sniffio 1.3.0 soupsieve 2.5 SQLAlchemy 2.0.25 starlette 0.32.0.post1 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 threadpoolctl 3.2.0 tiktoken 0.5.2 timm 0.9.12 tokenizers 0.15.1 torch 2.2.0 torchvision 0.17.0 tqdm 4.66.1 transformers 4.37.2 typer 0.9.0 typing_extensions 4.9.0 typing-inspect 0.9.0 tzdata 2023.4 unstructured 0.12.3 unstructured-client 0.17.0 unstructured-inference 0.7.23 unstructured.pytesseract 0.3.12 urllib3 2.2.0 uvicorn 0.27.0.post1 uvloop 0.19.0 validators 0.22.0 watchfiles 0.21.0 weaviate-client 4.4.1 websocket-client 1.7.0 websockets 12.0 wheel 0.41.2 wrapt 1.16.0 xlrd 2.0.1 XlsxWriter 3.1.9 yarl 1.9.4 zipp 3.17.0
Hi I tried runnig your demo with:
./sparrow.sh ingest
But this resulted in this error:
Before I started docker, ran the installation and downloaded the model: