h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.28k stars 1.24k forks source link

CUDA out of memory #865

Open forsasim opened 1 year ago

forsasim commented 1 year ago

Hi Working on a private GPT kind of setup. i am getting below error while uploading file(s). how to resolve this? Using AWS EC2 (G4DN with 42 core cpu and 4 NIDIA GPU) attached the configuration as image.

Error Message: Tried to allocate 36.00 MiB (GPU 0; 14.84 GiB total capacity; 14.06 GiB already allocated; 28.19 MiB free; 14.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

forsasim commented 1 year ago

image image image image

emil-jose commented 1 year ago

I am also getting the same error on the code base which was working fine a few days before!

pseudotensor commented 1 year ago

@forsasim If you have 4 GPUs you can run with --use_gpu_id=False to spread the LLM over multiple GPUs. If the intent is to run with 1 GPU, then 7b would be enough if have 24GB GPU for that model. But then I'd recommend using GGML instead.

As for new vs. old behavior @emil-jose , nothing I'm aware of should use more GPU memory. It would help if you show nvidia-smi as well and what command you exactly ran.

forsasim commented 1 year ago

To be specific, I am getting this error while uploading documents. even its very small doc in few MBs I am getting this error. without document upload I am able to use the tool.

Also I am using Windows server and I got the same error when tried with Ubuntu.

antonio-castellon commented 1 year ago

same error here, with other solutions I have no problem... how we can reduce the amount of memory used by default by PyTorch inside the docker?

pseudotensor commented 1 year ago

There's no way to reduce torch use of memory. If your system cannot support the LLM + embedding, try miniall for embedding. This is discussed on low memory mode docs: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#low-memory-mode