Closed TheMeIonGod closed 1 year ago
Specify max CPU ram and see if it works.
I just gave that a try and it still pops up with the same error. Sometimes python will also crash on attempting to load the model.
I figured out the problem and got it working. Windows was only allowing 6 GB of virtual memory, which is why it would 'run out' of memory.
Here's how I fixed it:
Describe the bug
Yesterday, this was working perfectly fine. However, I decided to update it using the "update_windows.bat" file, and now I can't get any model to run. The main model I am trying to run is TheBloke/WizardLM-7B-uncensored-GPTQ, which was also working perfectly fine with extensions yesterday. But now, when I attempt to load any model (even without any extensions), it displays this error: Traceback (most recent call last): File “E:\AI\oobabooga_windows\text-generation-webui[server.py](http://server.py/)”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “E:\AI\oobabooga_windows\text-generation-webui\modules[models.py](http://models.py/)”, line 159, in load_model model = load_quantized(model_name) File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 178, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “E:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 77, in _load_quant make_quant(*make_quant_kwargs) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) [Previous line repeated 1 more time] File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 443, in make_quant module, attr, QuantLinear(bits, groupsize, tmp.in_features, tmp.out_features, faster=faster, kernel_switch_threshold=kernel_switch_threshold) File “E:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa[quant.py](http://quant.py/)”, line 154, in init ‘qweight’, torch.zeros((infeatures // 32 bits, outfeatures), dtype=torch.int) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 22544384 bytes.
Also, one thing to note is that every time I try to load the model, the amount it attempts to allocate increases significantly. I have reinstalled the text-generation-webui, restarted my computer five times, reinstalled my graphics drivers, and reinstalled Python.
Is there an existing issue for this?
Reproduction
Update using update_windows.bat Start the UI Load a model (TheBloke/WizardLM-7B-uncensored-GPTQ) Model fails to load.
Screenshot
No response
Logs
System Info