Closed Blacksuan19 closed 8 months ago
FYI the issue doesn't have directly to do with largeness of the collection. That doesn't cause GPU OOM.
It's probably caused by too large context that your GPU can't handle. You can try reducing several related things:
top_k_docs (smaller than default of 10)
max_input_tokens (set to some amount, e.g. 2048 instead of default that is based upon top_k_docs or model limit)
max_total_input_tokens (matters for summarization, while max_input_tokens is per-context use)
Will close for now, feel free to ask more.
the error still occurs after setting top_k_docs=1
and max_input_tokens=10024
, max_total_input_tokens=10024
despite being on query mode, for reference the context size is 16384. here is a full log
Keep decreasing max_input_tokens until it works. But top_k_docs of 1 should also have worked.
You can set --verbose=True and see what is being passed as the real prompt. See what is going on.
I am getting a cuda out of memory error when querying against All data in a collection, the collection is large and has more data than what can fit in the context, I'd expect h2ogpt to take enough data to fill the context when querying against the entire collection, is there an option for that?
querying a selection from a collection
the selected files are smaller than the model context of 16k
![image](https://github.com/h2oai/h2ogpt/assets/10248473/1936ac95-5fe1-4019-ac71-be11b8beccf3)querying against the entire collection
the entire collection size comes to larger than 16k tokens
![image](https://github.com/h2oai/h2ogpt/assets/10248473/62c139ae-241b-4b99-a151-7686aa30a7ff)full run command
```bash docker run \ --gpus all \ --runtime=nvidia \ --shm-size=2g \ -p 7860:7860 \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -u $(id -u):$(id -g) \ -v "${HOME}"/h2ogpt_mistral/.cache:/workspace/.cache \ -v "${HOME}"/h2ogpt_mistral/save:/workspace/save \ -v "${HOME}"/h2ogpt_mistral/user_path:/workspace/user_path \ -v "${HOME}"/h2ogpt_mistral/db_dir_UserData:/workspace/db_dir_UserData \ -v "${HOME}"/h2ogpt_mistral/users:/workspace/users \ -v "${HOME}"/h2ogpt_mistral/db_nonusers:/workspace/db_nonusers \ -v "${HOME}"/h2ogpt_mistral/auth:/workspace/auth \ -v "${HOME}"/h2ogpt_mistral/assets:/workspace/assets \ gcr.io/vorvan/h2oai/h2ogpt-runtime:$IMAGE_TAG /workspace/generate.py \ --openai_server=False \ --h2ogpt_api_keys="/workspace/auth/api_keys.json" \ --use_gpu_id=False \ --score_model=None \ --prompt_type=open_chat \ --base_model=TheBloke/openchat_3.5-16k-AWQ \ --compile_model=True \ --use_cache=True \ --use_flash_attention_2=True \ --attention_sinks=True \ --sink_dict="{'num_sink_tokens': 4, 'window_length': $CONTEXT_LENGTH }" \ --save_dir='/workspace/save/' \ --user_path='/workspace/user_path/' \ --langchain_mode="UserData" \ --langchain_modes="['UserData', 'LLM']" \ --visible_langchain_actions="['Query']" \ --visible_langchain_agents="[]" \ --use_llm_if_no_docs=True \ --max_seq_len=$CONTEXT_LENGTH \ --enable_ocr=True \ --enable_tts=False \ --enable_stt=False ```