Closed cosmiclantern closed 1 year ago
They all go in the same folder under "models\pygmalion-6b" It's supposed to look like this: https://huggingface.co/PygmalionAI/pygmalion-6b/tree/main
I have a T440P with an i7 and even running the model from here: https://huggingface.co/mayaeary/pygmalion-6b-4bit-128g/tree/main I do not have a a good time.
Thanks. So I need to make a directory "models\GPT-J-6B" and toss all the files in there, including the .bin?
What about the memory error?
Okay that did seem to make it try to load the model at startup, but it failed. From bash shell terminal:
$ python3 server.py --cpu
Gradio HTTP request redirected to localhost :)
bin /home/username/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/username/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading GPT-J-6B...
Killed
You don't have the memory to load a full FP16 model in CPU. You will have to use a 4bit model and probably wait 1-2 minutes for replies.
Oh I see, thanks
Can it be run in Google Colab pro instead then?
I've tried the notebook but I get the warning message from google saying it contains "disallowed" code, which could cause my colab to be restricted in the future. I've tried editing the bash commands to remove pygmalion and use gpt-j-6b instead, but I suck at this and can't get it to download that model instead.
Is everyone here running it on their own personal Lamborghini?
You'll find that google is cracking down on pyg and stable diffusion notebooks. With the kind of aged, laptop hardware you have, you would only be able to use very small models, and they are rather... limited... in their responses.
A lot of us are using personally built computers and using 2nd hand hardware. Recently i purchased an Nvidia Tesla P40 which has 24GB of VRAM for $370 Australian dollars on ebay from china. Its a circa 2017 datacenter GPU with no video out, but has enough VRAM to load 30B models and run them at acceptable speeds (2-4t/s depending on context size).
If you were looking to throw money at Google Colab pro, then you may as well pay for GPT3.5 turbo and run it through something like TavernAI(and its forks) or Agnaistic.
I hope that helps.
Also there is runpod, I think they are better than collab pro. Be careful that P40 will run in your board. It wouldn't work in my B450 system but the similar P6000 works fine. Also it uses a different power connector, 8pin CPU power and not PCIE, you will need to make or buy a cable. Plus it has no fan. It's definitely the best "deal" there is on any card.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
Describe the bug
When I try to load the GPT-J-6B model, under the name of pytorch_model.bin, nothing happens but I get the traceback described below. I'm running on a CPU as I don't have an external GPU. 16GB system RAM, Intel Core 15-3210M CPU.
Also I don't know where to put the text and json files so I just threw them all in the same folder as the model. Was that wrong? All those files show up in the models list, which doesn't seem right?
Is there an existing issue for this?
Reproduction
python3 server.py
Open localhost http://127.0.0.1:7860
Select the model from the model pytorch_model.bin dropdown box under the model tab.
Screenshot
No response
Logs
System Info