Remove GPTQ-for-LLaMA support

oobabooga / one-click-installers

Simplified installers for oobabooga/text-generation-webui.

GNU Affero General Public License v3.0

550 stars 186 forks source link

Remove GPTQ-for-LLaMA support #121

Closed oobabooga closed 1 year ago

oobabooga commented 1 year ago

To be merged after https://github.com/oobabooga/text-generation-webui/pull/3505, which itself will only be merged after AutoGPTQ gets a new version number.

The removal of GPTQ-for-LLaMA will greatly simplify this one-click-installer, which will not have to go through the whole quant_cuda compilation step at every update anymore.

Requesting feedback from @jllllll, do you think that these removals are safe and make sense?

jllllll commented 1 year ago

Looks good to me.

jllllll commented 1 year ago

Just keep in mind that cards older than Pascal will lose access to GPTQ with this change as qwopqwop200's cuda branch is the only GPTQ implementation that I know of that supports those older cards. In theory, AutoGPTQ can make the faster kernel functions only used on cards that support them to fix this. Don't know enough about C++ to know how viable that is.

This isn't too big of an issue given that GGML with cuBLAS is at least comparable in speed, likely faster than the cuda branch. It is what I use the most as it allows me to use 30B models. Just felt I should mention it.

oobabooga commented 1 year ago

I wasn't aware that AutoGPTQ didn't work on older cards. Since there is this loss in functionality, removing GPTQ-for-LLaMa is probably not a good idea.

jllllll commented 1 year ago

Yeah. It is the faster kernels that rely on compute 6.0 functionality to work. It is the same reason why your GPTQ fork also requires Pascal at minimum. qwopqwop200's cuda branch doesn't have those faster kernels, so it can be compiled on older cards.

oobabooga commented 1 year ago

I have closed the PR and will close this one too.

Still, I wonder if the one-click-installer can be simplified without any loss of functionality somehow. On my Windows partition, whenever I run the update.bat script, it tries to compile quant_cuda, fails, and then shows an error.

jllllll commented 1 year ago

We can always just install wheels. There is little reason anymore to build quant_cuda locally for NVIDIA installations.

I'll write up a PR for that shortly with a description of what can be added to requirements.txt.

oobabooga commented 1 year ago

I think that would be perfect, if it would cover all the compute capabilities/vendors that the current installer covers. Since GPTQ-for-LLaMa doesn't get updated, it would be a compile and forget kind of thing.

Thanks for your feedback and for looking into this, I really appreciate it.

jllllll commented 1 year ago

Another thing I may look into is converting GPTQ-for-LLaMa into an actual python package to eliminate the need for cloning it into repositories. Honestly not too sure why it isn't like that to begin with.

oobabooga commented 1 year ago

That would be even better. GPTQ-for-LLaMa was never meant to be a library. qwopqwop200 seemed to be messing with it with the goal of getting the lowest possible perplexity for personal use.