"Memory error", can't load a model

cosmiclantern commented 1 year ago

Describe the bug

When I try to load the GPT-J-6B model, under the name of pytorch_model.bin, nothing happens but I get the traceback described below. I'm running on a CPU as I don't have an external GPU. 16GB system RAM, Intel Core 15-3210M CPU.

Also I don't know where to put the text and json files so I just threw them all in the same folder as the model. Was that wrong? All those files show up in the models list, which doesn't seem right?

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

python3 server.py

Open localhost http://127.0.0.1:7860

Select the model from the model pytorch_model.bin dropdown box under the model tab.

Screenshot

No response

Logs

Traceback (most recent call last):
File “/home/username/text-generation-webui/server.py”, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/home/username/text-generation-webui/modules/models.py”, line 209, in load_model
model = LoaderClass.from_pretrained(checkpoint, **params)
File “/home/username/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 441, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File “/home/username/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py”, line 916, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File “/home/username/.local/lib/python3.10/site-packages/transformers/configuration_utils.py”, line 573, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File “/home/username/.local/lib/python3.10/site-packages/transformers/configuration_utils.py”, line 658, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File “/home/username/.local/lib/python3.10/site-packages/transformers/configuration_utils.py”, line 745, in _dict_from_json_file
text = reader.read()
MemoryError

System Info

Lenovo Thinkpad x230, 16GB system RAM, Intel Core 15-3210M CPU.

Ph0rk0z commented 1 year ago

They all go in the same folder under "models\pygmalion-6b" It's supposed to look like this: https://huggingface.co/PygmalionAI/pygmalion-6b/tree/main

I have a T440P with an i7 and even running the model from here: https://huggingface.co/mayaeary/pygmalion-6b-4bit-128g/tree/main I do not have a a good time.

cosmiclantern commented 1 year ago

Thanks. So I need to make a directory "models\GPT-J-6B" and toss all the files in there, including the .bin?

What about the memory error?

cosmiclantern commented 1 year ago

Okay that did seem to make it try to load the model at startup, but it failed. From bash shell terminal:

$ python3 server.py --cpu

Gradio HTTP request redirected to localhost :)
bin /home/username/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/username/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading GPT-J-6B...
Killed

Ph0rk0z commented 1 year ago

You don't have the memory to load a full FP16 model in CPU. You will have to use a 4bit model and probably wait 1-2 minutes for replies.

cosmiclantern commented 1 year ago

Oh I see, thanks

cosmiclantern commented 1 year ago

Can it be run in Google Colab pro instead then?

I've tried the notebook but I get the warning message from google saying it contains "disallowed" code, which could cause my colab to be restricted in the future. I've tried editing the bash commands to remove pygmalion and use gpt-j-6b instead, but I suck at this and can't get it to download that model instead.

Is everyone here running it on their own personal Lamborghini?

askmyteapot commented 1 year ago

You'll find that google is cracking down on pyg and stable diffusion notebooks. With the kind of aged, laptop hardware you have, you would only be able to use very small models, and they are rather... limited... in their responses.

A lot of us are using personally built computers and using 2nd hand hardware. Recently i purchased an Nvidia Tesla P40 which has 24GB of VRAM for $370 Australian dollars on ebay from china. Its a circa 2017 datacenter GPU with no video out, but has enough VRAM to load 30B models and run them at acceptable speeds (2-4t/s depending on context size).

If you were looking to throw money at Google Colab pro, then you may as well pay for GPT3.5 turbo and run it through something like TavernAI(and its forks) or Agnaistic.

I hope that helps.

Ph0rk0z commented 1 year ago

Also there is runpod, I think they are better than collab pro. Be careful that P40 will run in your board. It wouldn't work in my B450 system but the similar P6000 works fine. Also it uses a different power connector, 8pin CPU power and not PCIE, you will need to make or buy a cable. Plus it has no fan. It's definitely the best "deal" there is on any card.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

oobabooga / text-generation-webui