Closed tomerjr closed 2 months ago
It's already documented here near bottom of this section:
https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#database-creation
so when using docker i must use weaviate in order to change the user_path and the collection_name? from the doc its not stated how to run it locally with these arguments and with docker otherwise. @pseudotensor
The bottom of the section has this:
mkdir -p ~/.cache
mkdir -p ~/save
mkdir -p ~/user_path
mkdir -p ~/db_dir_UserData
docker run \
--gpus all \
--runtime=nvidia \
--shm-size=2g \
--rm --init \
--network host \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v "${HOME}"/.cache:/workspace/.cache \
-v "${HOME}"/save:/workspace/save \
-v "${HOME}"/user_path:/workspace/user_path \
-v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.1 /workspace/src/make_db.py --verbose --use_unstructured_pdf=False --enable_pdf_ocr=False --hf_embedding_model=BAAI/bge-small-en-v1.5 --cut_distance=10000
It doesn't use weaviate. Can you explain what you mean?
For example, i ran something like this:
export GRADIO_SERVER_PORT=7860
export OPENAI_SERVER_PORT=5000
sudo docker run \
--gpus all \
--runtime=nvidia \
--shm-size=1g \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
-p $OPENAI_SERVER_PORT:$OPENAI_SERVER_PORT \
--rm --init \
--network host \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u id -u
:id -g
\
-v "${HOME}"/.cache/huggingface/hub/:/workspace/.cache/huggingface/hub \
-v "${HOME}"/.config:/workspace/.config/ \
-v "${HOME}"/.triton:/workspace/.triton/ \
-v "${HOME}"/save:/workspace/save \
-v "${HOME}"/user_path:/workspace/user_path \
-v "${HOME}"/db_dir_UserData:/workspace/ducks \
-v "${HOME}"/users:/workspace/users \
-v "${HOME}"/db_nonusers:/workspace/db_nonusers \
-v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \
-v "${HOME}"/h2ogpt_auth:/workspace/h2ogpt_auth \
-e GRADIO_SERVER_PORT=$GRADIO_SERVER_PORT \
gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.1 /workspace/src/make_db.py \
--collection_name=Ducks --user_path=:/workspace/duck --langchain_type=personal --persist_directory=users/tomer/db_dir_duck
And then:
export GRADIO_SERVER_PORT=7860
export OPENAI_SERVER_PORT=5000
sudo docker run \
--gpus all \
--runtime=nvidia \
--shm-size=1g \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
-p $OPENAI_SERVER_PORT:$OPENAI_SERVER_PORT \
--rm --init \
--network host \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u id -u
:id -g
\
-v "${HOME}"/.cache/huggingface/hub/:/workspace/.cache/huggingface/hub \
-v "${HOME}"/.config:/workspace/.config/ \
-v "${HOME}"/.triton:/workspace/.triton/ \
-v "${HOME}"/save:/workspace/save \
-v "${HOME}"/user_path:/workspace/user_path \
-v "${HOME}"/db_dir_UserData:/workspace/users/tomer/db_dir_duck \
-v "${HOME}"/users:/workspace/users \
-v "${HOME}"/db_nonusers:/workspace/db_nonusers \
-v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \
-v "${HOME}"/h2ogpt_auth:/workspace/h2ogpt_auth \
-e GRADIO_SERVER_PORT=$GRADIO_SERVER_PORT \
gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.1 /workspace/generate.py \ --base_model=https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q2_K.gguf \
--use_safetensors=True \
--prompt_type=zephyr \
--save_dir='/workspace/save/' \
--use_gpu_id=False \
--user_path=/workspace/user_path \
--langchain_mode="LLM" \
--langchain_modes="['UserData', 'LLM','tomer']" \
--score_model=None \
--max_max_new_tokens=2048 \
--max_new_tokens=1024 \
--openai_port=$OPENAI_SERVER_PORT
And it does add the collection and I can see it but I cant see the documents in it when i run gen.py. I can only see the documents in the default user. What I want is to run it on my chosen folder, for my chosen user/db and then to be able to load it with gen.py.
Few things:
1.
If I try roughly what you did, I notice --user_path=:/workspace/duck
with odd :
inside.
This is explained in the make_db docs:
This is also understood by the same make_db docs lines as above, where
You need to login and/or use auth or some kind. Login is simplest first thing.
Here's what I try, just to follow along:
python src/make_db.py --collection_name=Ducks --user_path=user_path_test --langchain_type=personal --persist_directory=users/tomer/db_dir_duck
python generate.py --base_model=https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q2_K.gguf --use_safetensors=True --prompt_type=zephyr --save_dir='save2' --use_gpu_id=False --user_path=user_path_test --langchain_mode="LLM" --langchain_modes="['UserData', 'LLM','tomer']" --score_model=None --add_disk_models_to_ui=False
What I see is that you have (as you showed) a directory that is not the same as the database you created. i.e. it has a hash inside it. That's because by default personal directories are hashed like that. So when you just set langchain_mode as "tomer" it doesn't know where that db is located.
I don't currently have a way to specify the database path for personal databases for such "temporary" users.
E.g. so you should run:
python src/make_db.py --collection_name=duck --user_path=user_path_test --langchain_type=personal --persist_directory=users/tomer/db_dir_duck/
Then run without "tomer" since that is only for a single user, above named "tomer".
python generate.py --base_model=https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q2_K.gguf --use_safetensors=True --prompt_type=zephyr --save_dir='save2' --use_gpu_id=False --user_path=user_path_test --langchain_mode="LLM" --langchain_modes="['UserData', 'LLM']" --score_model=None --add_disk_models_to_ui=False
Then login as user "tomer"
Then you'll see the "Directory" be correct:
and you'll see your docs when choosing the duck collection:
@pseudotensor how will this work for docker? Whenever i try to change the user_path i get an error message saying "user_path != user_path" or something similar. I need instructions regarding docker specifically please. Thank you for all your help.
I'm not aware of any issue that should occur specifically for docker. The instructions I added should work. If you can share your error I'm happy to look.
@pseudotensor In addition, I just realized that you were referring to uploading documents and then creating the db for them but that is not what I want. I want to make the DB separately and then for the user to have it loaded upon logging in. Again, thank you so much for all your help so far.
I'd guess "user_path" has wrong permissions or is a file. Needs to be fixed.
I'd guess "user_path" has wrong permissions or is a file. Needs to be fixed.
@pseudotensor I checked and it is not a file. also, the permissions for the "duck" directory are the same for "user_path" directory. it seems like there is an assertion that the path should be "user_path" directory for some reason.
When you are mapping maps, you are mapping /home/user_path -> /workspace/duck inside docker. So h2oGPT inside docker won't be able to find that path unless you set --user_path=/workspace/duck
Hi, would like a clarification as to how to use make_db with specific path as data source and with the ability to create a collection name.
I've tried to use it like the normal instructions with --collection_name and --user_path but it did not work and i did not find an answer to this in other issues or the doc files.
for example i tried:
export GRADIO_SERVER_PORT=7860 export OPENAI_SERVER_PORT=5000 sudo docker run \ --gpus all \ --runtime=nvidia \ --shm-size=1g \ -p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \ -p $OPENAI_SERVER_PORT:$OPENAI_SERVER_PORT \ --rm --init \ --network host \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -u
id -u
:id -g
\ -v "${HOME}"/.cache/huggingface/hub/:/workspace/.cache/huggingface/hub \ -v "${HOME}"/.config:/workspace/.config/ \ -v "${HOME}"/.triton:/workspace/.triton/ \ -v "${HOME}"/save:/workspace/save \ -v "${HOME}"/user_path:/workspace/user_path \ -v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \ -v "${HOME}"/users:/workspace/users \ -v "${HOME}"/db_nonusers:/workspace/db_nonusers \ -v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \ -v "${HOME}"/h2ogpt_auth:/workspace/h2ogpt_auth \ -e GRADIO_SERVER_PORT=$GRADIO_SERVER_PORT \ gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.1 /workspace/src/make_db.py \ --collection_name=DuckThank you and have a good day.