oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.91k stars 5.34k forks source link

CUDA out of memory on CPU mode #2056

Closed LFL38 closed 1 year ago

LFL38 commented 1 year ago

Describe the bug

I have installed oobabooga on the CPU mode but when I try to launch pygmalion it says "CUDA out of memory"

Is there an existing issue for this?

Reproduction

Run oobabooga pygmalion on the CPU mode.

Screenshot

No response

Logs

INFO:Loading mayaeary_pygmalion-6b_dev-4bit-128g...
INFO:Found the following quantized model: models\mayaeary_pygmalion-6b_dev-4bit-128g\pygmalion-6b_dev-4bit-128g.safetensors
Traceback (most recent call last):
  File "C:\Users\Floppa\Desktop\oobabooga_windows\text-generation-webui\server.py", line 948, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 200, in load_quantized
    model = model.to(torch.device('cuda:0'))
  File "C:\Users\Floppa\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "C:\Users\Floppa\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB (GPU 0; 4.00 GiB total capacity; 3.23 GiB already allocated; 0 bytes free; 3.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Done!
Press any key to continue . . .

### System Info

```shell
Specs:
PALIT 1050 TI
ryzen 5 3600
16gb 3200mhz RAM
1TB NVMe
Ph0rk0z commented 1 year ago

I don't think you can use GPTQ in CPU mode.. download the GGML and use llama.cpp or koboldcpp

LFL38 commented 1 year ago

I don't think you can use GPTQ in CPU mode.. download the GGML and use llama.cpp or koboldcpp

wait can I use GPT4xAlpaca on a RX 6600? it has 8gb vram, not sure how much GPT4xAlpaca needs, with oobabooga

Ph0rk0z commented 1 year ago

Yea.. but you need to set up ROCM in linux or WSL (not sure how well it works here). And I think you may still have to do pre-layer.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.