oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.48k stars 5.19k forks source link

GPT-NeoX and Pythia support + GPTQ-for-GPT-NeoX branch #341

Closed Digitous closed 1 year ago

Digitous commented 1 year ago

I am working on integrating GPT-NeoX and Pythia support into GPTQ-for-LLaMa, aiming to add 4-bit GPTQ quantization and inference capabilities. This would enable a NeoX20B to run on a single RTX3090, or Pythia12B on even lower-end hardware, using only VRAM.

I have uploaded two files, neox.py and neox2.py, which represent two different approaches I attempted. However, my limited understanding of NeoX's layers and intermediate experience with Python have hindered my progress.

I have spent hours on this, but I am stuck. If anyone has expertise in the NeoX architecture and layer structure, please offer assistance.

https://github.com/Digitous/GPTQ-for-GPT-NeoX

oobabooga commented 1 year ago

It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a galactica-30b-4bit.pt file, but the outputs of this quantized model were gargabe.

https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/46

Digitous commented 1 year ago

It would be nice if this worked. I am personally interested in quantizing GALACTICA-30b as well, and I managed to generate a galactica-30b-4bit.pt file, but the outputs of this quantized model were gargabe.

qwopqwop200/GPTQ-for-LLaMa#46

Closing my repo; after some back and forth with what I tried in DMs, a fellow member of KAI Discord figured out a working implementation VIA https://github.com/0cc4m/GPTQ-for-LLaMa/tree/gptneox

I'm about to try it out soon as I download TogetherComputer's new NeoX20b instruct based chat model.

..also if it works as hoped the code is open to integrating if interested; 99% sure Occam would be all for it. Any other model integrations pop up I'll share.

Wingie commented 1 year ago

Did you manage to get this working? Im having an error while installing the cuda kernel om that branch

Ph0rk0z commented 1 year ago

I get nan error from it when generating. The kernel is fine unless you have an older GPU.. like pre-pascal

OSST works.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.