First off, some updates cause Gradio to fail, with a "serialized input" error that others have reported so won't go into detail there. Updating gradio to anything 3.26 or above resolves this issue, but then still has issues with loading models. As others have also reported, GPTQ models will load without an explicit error, but leaves the terminal with a "press any key to continue" prompt that closes the terminal. On some models, it will instead give OOM errrors when they previously worked. Trying different forks of GPTQ-for-LLaMa will still cause issues. Using different older commits and forks with Gradio 3.25 only fixes the initial launch, but models still fail to load correctly, indicating to me this has something to do with updates made to the environment setup by your one-click installers more than the actual text-generation-webui repository, but there's nothing to tell me what package(s) are causing the issues. Using either an older version or current version of the installers does not fix this issue.
Something in the installer most likely is updating a package instead of using a static version if I had to guess since the oobabooga GPTQ fork hasn't been updated in 3 weeks and it's only non-fixed requirement, sentencepiece, is not the culprit, but that's about the extent of my troubleshooting skills without further guidance.
Is there an existing issue for this?
[X] I have searched the existing issues
Reproduction
Issue first arose when I had an existing install and then reinstalled GPTQ-for-LLaMa to troubleshoot a new model failing to load (other models loaded correctly prior to this)
These issue will also occur on a fresh install of text-generation-webui.
Screenshot
No response
Logs
Scenario 1 (loading Pygmalion):
Loading model ...
Done!
Press any key to continue . . .
Scenario 2 (loading WizardLM):
Loading model ...
Traceback (most recent call last):
File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\server.py", line 872, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
model = load_quantized(model_name)
File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 176, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 228, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1172, in _load
result = unpickler.load()
File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1112, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)._typed_storage()._untyped_storage
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 8388608 bytes.
Done!
I found out that --disk is not working as expected and my page file size was lowered after restarting my PC. So I'll look into the --disk a bit more, but with increased page file again, works as intended now
Describe the bug
First off, some updates cause Gradio to fail, with a "serialized input" error that others have reported so won't go into detail there. Updating gradio to anything 3.26 or above resolves this issue, but then still has issues with loading models. As others have also reported, GPTQ models will load without an explicit error, but leaves the terminal with a "press any key to continue" prompt that closes the terminal. On some models, it will instead give OOM errrors when they previously worked. Trying different forks of GPTQ-for-LLaMa will still cause issues. Using different older commits and forks with Gradio 3.25 only fixes the initial launch, but models still fail to load correctly, indicating to me this has something to do with updates made to the environment setup by your one-click installers more than the actual text-generation-webui repository, but there's nothing to tell me what package(s) are causing the issues. Using either an older version or current version of the installers does not fix this issue. Something in the installer most likely is updating a package instead of using a static version if I had to guess since the oobabooga GPTQ fork hasn't been updated in 3 weeks and it's only non-fixed requirement, sentencepiece, is not the culprit, but that's about the extent of my troubleshooting skills without further guidance.
Is there an existing issue for this?
Reproduction
Issue first arose when I had an existing install and then reinstalled GPTQ-for-LLaMa to troubleshoot a new model failing to load (other models loaded correctly prior to this) These issue will also occur on a fresh install of text-generation-webui.
Screenshot
No response
Logs
System Info