h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.42k stars 1.25k forks source link

Unable to import files on MacOS #1252

Open maxvamp12 opened 10 months ago

maxvamp12 commented 10 months ago

Environment settings and versions

Sonoma 14.3 Python 3.10.13 conda 23.7.4 h20gpt-osx-m1-gpu : Nov 8, 2023 build 2021 M1 Max 16" 64GB Memory TheBloke/Llama-2-7b-chat-fp16 ./h2ogpt-osx-m1-gpu --user_path=/Volumes/Mac\ Development/AI/h2ogpt/h2ogpt-runtime/data/ chromadb v0.4.21 Chrome Version 120.0.6099.129 (Official Build) (arm64) MacOS

Package Version


accelerate 0.25.0 annotated-types 0.6.0 anyio 4.2.0 asgiref 3.7.2 backoff 2.2.1 bcrypt 4.1.2 cachetools 5.3.2 certifi 2023.11.17 charset-normalizer 3.3.2 chroma-hnswlib 0.7.3 chromadb 0.4.21 click 8.1.7 coloredlogs 15.0.1 Deprecated 1.2.14 exceptiongroup 1.2.0 faiss-cpu 1.7.4 fastapi 0.108.0 filelock 3.13.1 flatbuffers 23.5.26 fsspec 2023.12.2 google-auth 2.25.2 googleapis-common-protos 1.62.0 grpcio 1.60.0 h11 0.14.0 httptools 0.6.1 huggingface-hub 0.20.1 humanfriendly 10.0 idna 3.6 importlib-metadata 6.11.0 importlib-resources 6.1.1 Jinja2 3.1.2 kubernetes 28.1.0 MarkupSafe 2.1.3 mmh3 4.0.1 monotonic 1.6 mpmath 1.3.0 networkx 3.2.1 numpy 1.26.2 oauthlib 3.2.2 onnxruntime 1.16.3 opentelemetry-api 1.22.0 opentelemetry-exporter-otlp-proto-common 1.22.0 opentelemetry-exporter-otlp-proto-grpc 1.22.0 opentelemetry-instrumentation 0.43b0 opentelemetry-instrumentation-asgi 0.43b0 opentelemetry-instrumentation-fastapi 0.43b0 opentelemetry-proto 1.22.0 opentelemetry-sdk 1.22.0 opentelemetry-semantic-conventions 0.43b0 opentelemetry-util-http 0.43b0 overrides 7.4.0 packaging 23.2 pip 23.3.1 posthog 3.1.0 protobuf 4.25.1 psutil 5.9.7 pulsar-client 3.3.0 pyasn1 0.5.1 pyasn1-modules 0.3.0 pydantic 2.5.3 pydantic_core 2.14.6 PyPika 0.48.9 python-dateutil 2.8.2 python-dotenv 1.0.0 PyYAML 6.0.1 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 safetensors 0.4.1 setuptools 68.2.2 six 1.16.0 sniffio 1.3.0 starlette 0.32.0.post1 sympy 1.12 tenacity 8.2.3 tokenizers 0.15.0 torch 2.1.2 tqdm 4.66.1 typer 0.9.0 typing_extensions 4.9.0 urllib3 1.26.18 uvicorn 0.25.0 uvloop 0.19.0 watchfiles 0.21.0 websocket-client 1.7.0 websockets 12.0 wheel 0.41.2 wrapt 1.16.0 zipp 3.17.0 (

PYTHON PATH: PYTHONPATH: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8 Path_1: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8 NLTK_DATA: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/nltk_data PATH: /Users/xxxx/anaconda3/envs/h20/bin:/Users/xxxxxxxx/anaconda3/condabin:/opt/homebrew/anaconda3/bin:/usr/local/anaconda3/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Library/Frameworks/Python.framework/Versions/3.10/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/Applications/VMware Fusion.app/Contents/Public:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Volumes/XBOX Work Space/Maven/apache-maven-3.8.3/bin:/Users/xxxxx/dotnet:/Users/xxxxxxx/Library/Application Support/JetBrains/Toolbox/scripts:/var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/poppler/bin/:/var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/poppler/lib/:/var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/Tesseract-OCR Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/src Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/iterators Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/gradio_utils Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/metrics Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/models Path_3: /var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIJReen8/h2ogpt/.

### Description of the issue and things tried

Anytime I try to import a document, I get an error of

/Volumes/Mac Development/AI/h2ogpt/h2ogpt-runtime/data/Teaching of the Mystics.pdf Exception: [Errno 2] No such file or directory: '/private/var/folders/7n/9837s8hx6gg3h7q2pyct_sth0000gn/T/_MEIHdUO3A/unstructured/nlp/english-words.txt'

or sometimes I get module chromadb.telemetry not loaded. Same with Faiss, even though both are installed.

No module named 'chromadb.telemetry.posthog'

I have changed the user data directory to see if any DB is ever created as well as checked for the file english-words.txt, which does seem to be missing. The file permissions are all accessible. No databases appear in the db_dir_UserData, however DB folders do appear to be created in the db_nonusers director. There are no db files under this db_nonusers sub folders.

All of the actions are being performed in the Chrome GUI. I have not tried to use the CLI.

pseudotensor commented 10 months ago

@Mathanraj-Sharma please help.

maxvamp12 commented 10 months ago

I was a bit brain dead when writing the bug. YAY Winter colds!!!! Here are the repro steps. It was pretty straight forward.

I did move my users directory to something that would not be impacted by file system permissions. Hence the additional cli arg. I did not see any files created in the temp folders.

I am running everything from the CLI, no finder clicks.... from the non-root volume folder "/Volumes/Mac Development/AI/h2ogpt/h2ogpt-runtime/"

Repro Steps:

Mathanraj-Sharma commented 8 months ago

@maxvamp12 could you please try with the latest artifacts https://github.com/h2oai/h2ogpt#macos-cpum1m2-with-full-document-qa-capability