h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.24k stars 1.23k forks source link

GPU memory allocation for collections #1099

Open ssa38 opened 10 months ago

ssa38 commented 10 months ago

I'm trying to run generate.py with a few collections I created with make_db. I can't add more than 6 collections with my current configuration (Tesla T4 with 16GB of VRAM) because I run out of memory. Seems like each collection uses up to 1.5GB of memory no matter if it takes 100KB on the disk. Is there a way to reduce the memory allocation for collections?

My CLI:

python generate.py --base_model=HuggingFaceH4/zephyr-7b-beta --hf_embedding_model=sentence-transformers/all-MiniLM-L6-v2 --score_model=None --load_4bit=True --visible_h2ogpt_header=False --top_k_docs=4 --langchain_modes="['A', 'B', 'C', 'D', 'E',' F']" --max_seq_len=2048

I make shared collections the following way:

python src/make_db.py --user_path="collections/A" --collection_name=A --langchain_type=shared --hf_embedding_model=hkunlp/instructor-large
pseudotensor commented 10 months ago

By default --pre_load_embedding_model=True so every collection should use the same embedding model, as long as the embedding model is the same. Let me check on it.

pseudotensor commented 10 months ago

I created 7 collections total, pdf or image to each. I only saw GPU increase when adding image since uses image model. But I'm only seeing up to 4.9GB after all 7 collections with no increase for each new collection.

image

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1548      G   /usr/lib/xorg/Xorg                          156MiB |
|    0   N/A  N/A      2236      G   /usr/lib/xorg/Xorg                          914MiB |
|    0   N/A  N/A      2375      G   /usr/bin/gnome-shell                        127MiB |
|    0   N/A  N/A      7204      G   /usr/bin/nvidia-settings                      0MiB |
|    0   N/A  N/A      7567      G   gnome-control-center                          4MiB |
|    0   N/A  N/A      8532      G   ...1815146,10427758287291803364,262144      159MiB |
|    0   N/A  N/A    289037      G   ...ures=SpareRendererForSitePerProcess       35MiB |
|    0   N/A  N/A    557082      G   obs                                          37MiB |
|    0   N/A  N/A    622485      C   ...niconda3/envs/h2ollm/bin/python3.10     4910MiB |
|    1   N/A  N/A      1548      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      2236      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+
pseudotensor commented 10 months ago

Maybe it's when making db via make_db that there's some issue...

ssa38 commented 10 months ago

After I got the out of memory I started adding collections one by one and checking the memory usage. Below are the screenshots after adding 4th, 5th and 6th collection. Every time it was 1.3 -1.7GB of added memory usage.

image image image