oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
38.64k stars 5.1k forks source link

CPU Allocator Error #2093

Closed TheMeIonGod closed 1 year ago

TheMeIonGod commented 1 year ago

Describe the bug

Yesterday, this was working perfectly fine. However, I decided to update it using the "update_windows.bat" file, and now I can't get any model to run. The main model I am trying to run is TheBloke/WizardLM-7B-uncensored-GPTQ, which was also working perfectly fine with extensions yesterday. But now, when I attempt to load any model (even without any extensions), it displays this error: Traceback (most recent call last): File “E:\AI\oobabooga_windows\text-generation-webui[server.py](http://server.py/)”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “E:\AI\oobabooga_windows\text-generation-webui\modules[models.py](http://models.py/)”, line 159, in load_model model = load_quantized(model_name) File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 178, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 77, in _load_quant make_quant(*make_quant_kwargs) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) [Previous line repeated 1 more time] File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 443, in make_quant module, attr, QuantLinear(bits, groupsize, tmp.in_features, tmp.out_features, faster=faster, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 154, in init ‘qweight’, torch.zeros((infeatures // 32 bits, outfeatures), dtype=torch.int) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 22544384 bytes.

Also, one thing to note is that every time I try to load the model, the amount it attempts to allocate increases significantly. I have reinstalled the text-generation-webui, restarted my computer five times, reinstalled my graphics drivers, and reinstalled Python.

Is there an existing issue for this?

Reproduction

Update using update_windows.bat Start the UI Load a model (TheBloke/WizardLM-7B-uncensored-GPTQ) Model fails to load.

Screenshot

No response

Logs

Traceback (most recent call last):
File “E:\AI\oobabooga_windows\text-generation-webui\server.py”, line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “E:\AI\oobabooga_windows\text-generation-webui\modules\models.py”, line 159, in load_model
model = load_quantized(model_name)
File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 178, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 77, in _load_quant
make_quant(**make_quant_kwargs)
File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant
make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold)
File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant
make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold)
File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant
make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold)
[Previous line repeated 1 more time]
File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 443, in make_quant
module, attr, QuantLinear(bits, groupsize, tmp.in_features, tmp.out_features, faster=faster, kernel_switch_threshold=kernel_switch_threshold)
File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 154, in init
‘qweight’, torch.zeros((infeatures // 32 * bits, outfeatures), dtype=torch.int)
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 22544384 bytes.

System Info

GPU GTX 1080
CPU Ryzen 7 1800x
RAM 16gb 3200mhz DDR4
TheMeIonGod commented 1 year ago

Specify max CPU ram and see if it works.

I just gave that a try and it still pops up with the same error. Sometimes python will also crash on attempting to load the model.

TheMeIonGod commented 1 year ago

I figured out the problem and got it working. Windows was only allowing 6 GB of virtual memory, which is why it would 'run out' of memory.

Here's how I fixed it:

  1. I typed 'SystemPropertiesAdvanced.exe' into the taskbar search and opened it.
  2. I clicked on 'Advanced.'
  3. Under 'Performance,' I clicked on 'Settings.'
  4. In the 'Performance Options,' I clicked 'Advanced' again.
  5. Under 'Virtual Memory,' I clicked on 'Change.'
  6. Then, I unchecked 'Automatically manage paging file size for all drives.'
  7. I left it checked for 'System Managed Size.'
  8. After that, I clicked 'OK' and then 'OK' again.
  9. After restarting your computer, it should be fixed.