Closed NotBillyJoel357 closed 1 year ago
You need a better GPU with more VRAM
Any way to do it without a better GPU?
Tick the CPU option, it's gonna be painfully slow though.
With only 4GB of VRAM @bloodsign is probably right... you'll be OOM with most anything. Regular offloading to CPU is usually pretty slow with one exception... Llama.cpp
I'd recommend looking at this page to see if it fits your intended use https://github.com/oobabooga/text-generation-webui/wiki/llama.cpp-models
I'm running into similar issues. I'm using a RTX3060 with 6gb of ram and having the same error. Is there no way around this?
I started getting this today after reinstalling. (Accidentally deleted my WSL install. Whoops.) It wasn't a problem before.
I'm running on Windows 11 inside WSL with an RTX 3080 (12GB). It doesn't happen for the first few generations, then suddenly I only get a single character/token out and it's out of memory.
After prompting, you need to kill the process to release the cuda
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
Traceback (most recent call last): File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\text-generation-webui[server.py](http://server.py/)”, line 85, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\text-generation-webui\modules[models.py](http://models.py/)”, line 100, in load_model model = load_quantized(model_name) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 168, in load_quantized model = accelerate.dispatch_model(model, device_map=device_map, offload_buffers=True) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\installer_files\env\lib\site-packages\accelerate\big_modeling.py”, line 370, in dispatch_model attach_align_device_hook_on_blocks( File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\installer_files\env\lib\site-packages\accelerate[hooks.py](http://hooks.py/)”, line 478, in attach_align_device_hook_on_blocks add_hook_to_module(module, hook) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\installer_files\env\lib\site-packages\accelerate[hooks.py](http://hooks.py/)”, line 155, in add_hook_to_module module = hook.init_hook(module) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\installer_files\env\lib\site-packages\accelerate[hooks.py](http://hooks.py/)”, line 251, in init_hook set_module_tensor_to_device(module, name, self.execution_device) File “C:\Users\andri\Desktop\TavernAI-main\oobabooga-windows\installer_files\env\lib\site-packages\accelerate\utils[modeling.py](http://modeling.py/)”, line 147, in set_module_tensor_to_device new_value = old_value.to(device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 4.00 GiB total capacity; 3.56 GiB already allocated; 0 bytes free; 3.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Please help.