h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.34k stars 1.24k forks source link

Newly created Collection available for all the users when authentication is enabled #1867

Open llmwesee opened 2 weeks ago

llmwesee commented 2 weeks ago

When authentication is enabled, I want to create a shared collection, such as UserData, that remains accessible to all users at all times, regardless of who created it.

I encountered an issue when attempting to create a new collection (UserData2). Here’s the process I followed:

  1. I first created the collection using the UI.
  2. I then ran the following command:

python generate.py --base_model=meta-llama/Llama-2-13b-chat-hf --score_model=None --langchain_modes="['UserData',UserData2,LLM,MyData]" --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048 --max_new_tokens=2048 --min_new_tokens=128 --prompt_type=llama2 --enable_stt=False --enable_tts=False --auth_filename=$auth_filename --auth_access=open --guest_name=avser --auth="[(admin, admin)]"

The issue is that the newly created collection (UserData2) is only accessible to the user who created it, similar to how MyData functions. However, I would like this collection to behave like UserData, where it is available to all users globally, even when authentication is enabled.

However When authentication is disabled during collection creation, the collection is accessible to all users as expected.

so Could you please provide guidance on how to create a shared collection like UserData that remains accessible to all users, even when authentication is enabled?

pseudotensor commented 2 weeks ago

How did you create it in the UI? The box takes a few args, including the collection type (assumed to be personal if no passed).

llmwesee commented 2 weeks ago

Document Selection >> Add Collection then type UserData2, shared, userpath then put UserData2 in the --langchain_modes="['UserData',UserData2,LLM,MyData]"

llmwesee commented 1 week ago

However i also created the collection with src/make_db.py: by adding all the files in the folder user_path3 then python src/make_db.py --user_path=user_path3 --collection_name=UserData3 --langchain_type=shared

python generate.py --base_model=meta-llama/Llama-2-13b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048 --batch_size=16 --prompt_type=llama2 --langchain_modes=['UserData','UserData3','MyData','LLM'] --auth_filename=$auth_filename --auth_access=open --guest_name=avser --auth="[(admin, admin)]"

then still didn't showing UserData3 in the collections for all the users although the embedding are stored in db_dir_UserData3 folder

And when adding --langchain_modes=['UserData','UserData3'] --langchain_mode_paths={'UserData':'user_path','UserData3':'user_path3'} --langchain_mode_types={'UserData':'shared','UserData3':'shared'}

to the command like:

python generate.py --base_model=meta-llama/Llama-2-13b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048 --batch_size=16 --prompt_type=llama2 --langchain_modes=['UserData','UserData3'] --langchain_mode_paths={'UserData':'user_path','UserData3':'user_path3'} --langchain_mode_types={'UserData':'shared','UserData3':'shared'} --auth_filename=$auth_filename --auth_access=open --guest_name=avser --auth="[(admin, admin)]"

then it showing the following error:

File "/home/xxxx/src/gen.py", line 1383, in main langchain_mode_paths = str_to_dict(langchain_mode_paths) File "/home/xxxx/src/utils.py", line 1863, in str_to_dict raise ValueError("Invalid str_to_dict for %s" % x) ValueError: Invalid str_to_dict for UserData3:user_path3

Note: I created the authentication server through the LDAP @pseudotensor please help me regarding this!