Open forsasim opened 1 year ago
I am also getting the same error on the code base which was working fine a few days before!
@forsasim If you have 4 GPUs you can run with --use_gpu_id=False
to spread the LLM over multiple GPUs. If the intent is to run with 1 GPU, then 7b would be enough if have 24GB GPU for that model. But then I'd recommend using GGML instead.
As for new vs. old behavior @emil-jose , nothing I'm aware of should use more GPU memory. It would help if you show nvidia-smi
as well and what command you exactly ran.
To be specific, I am getting this error while uploading documents. even its very small doc in few MBs I am getting this error. without document upload I am able to use the tool.
Also I am using Windows server and I got the same error when tried with Ubuntu.
same error here, with other solutions I have no problem... how we can reduce the amount of memory used by default by PyTorch inside the docker?
There's no way to reduce torch use of memory. If your system cannot support the LLM + embedding, try miniall for embedding. This is discussed on low memory mode docs: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#low-memory-mode
Hi Working on a private GPT kind of setup. i am getting below error while uploading file(s). how to resolve this? Using AWS EC2 (G4DN with 42 core cpu and 4 NIDIA GPU) attached the configuration as image.
Error Message: Tried to allocate 36.00 MiB (GPU 0; 14.84 GiB total capacity; 14.06 GiB already allocated; 28.19 MiB free; 14.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF