Latest updates breaks several things, primarily GPTQ models.

Describe the bug

First off, some updates cause Gradio to fail, with a "serialized input" error that others have reported so won't go into detail there. Updating gradio to anything 3.26 or above resolves this issue, but then still has issues with loading models. As others have also reported, GPTQ models will load without an explicit error, but leaves the terminal with a "press any key to continue" prompt that closes the terminal. On some models, it will instead give OOM errrors when they previously worked. Trying different forks of GPTQ-for-LLaMa will still cause issues. Using different older commits and forks with Gradio 3.25 only fixes the initial launch, but models still fail to load correctly, indicating to me this has something to do with updates made to the environment setup by your one-click installers more than the actual text-generation-webui repository, but there's nothing to tell me what package(s) are causing the issues. Using either an older version or current version of the installers does not fix this issue. Something in the installer most likely is updating a package instead of using a static version if I had to guess since the oobabooga GPTQ fork hasn't been updated in 3 weeks and it's only non-fixed requirement, sentencepiece, is not the culprit, but that's about the extent of my troubleshooting skills without further guidance.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Issue first arose when I had an existing install and then reinstalled GPTQ-for-LLaMa to troubleshoot a new model failing to load (other models loaded correctly prior to this) These issue will also occur on a fresh install of text-generation-webui.

Screenshot

No response

Logs

Scenario 1 (loading Pygmalion):
Loading model ...

Done!
Press any key to continue . . .
Scenario 2 (loading WizardLM):
Loading model ...
Traceback (most recent call last):
  File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\server.py", line 872, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 176, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
  File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 228, in load_quant
    model.load_state_dict(torch.load(checkpoint))
  File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "C:\Users\user\Desktop\oobabooga_windows\installer_files\env\lib\site-packages\torch\serialization.py", line 1112, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)._typed_storage()._untyped_storage
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 8388608 bytes.

Done!

System Info

Windows 11
RTX 3050ti Mobile 4GB
16GB RAM

oobabooga / text-generation-webui