oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.08k stars 5.16k forks source link

Continued Errors No module named ‘llama_inference_offload’ and llama.cpp #1952

Closed z3Nsk1Fh5a closed 1 year ago

z3Nsk1Fh5a commented 1 year ago

Describe the bug

Currently limited to basically using this repo for its useful scripts to just download models.

Doesnt seem like this works very well off the bat with ggml models. Tried a number of 7b quantized models with no luck.

Instead of jumping around weights and torrents like the similar issues suggest, are there any casual and more easily-accessible GUI tools with less compatibility issues atm?

Is there an existing issue for this?

Reproduction

Using any quantized model doesnt work. Particularly on AMD GPUS and M1/2 Macs.

Screenshot

No response

Logs

Traceback (most recent call last):
File “/Users/USERo/Documents/GitHub/text-generation-webui/modules/GPTQ_loader.py”, line 17, in 
import llama_inference_offload
ModuleNotFoundError: No module named ‘llama_inference_offload’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/Users/USERo/Documents/GitHub/text-generation-webui/server.py”, line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/Users/USER/Documents/GitHub/text-generation-webui/modules/models.py”, line 157, in load_model
from modules.GPTQ_loader import load_quantized
File “/Users/USERo/Documents/GitHub/text-generation-webui/modules/GPTQ_loader.py”, line 21, in 
sys.exit(-1)
SystemExit: -1

System Info

MacBook Pro (15-inch, 2021)
Chip Apple M1 Max
Memory 16GB
Mac OS 13.2.1
arctic-marmoset commented 1 year ago

There's a few things going on here I think.

You mentioned ggml but I think your log shows a gptq model. See this comment. ggml models have file names that end with .ggml.q4_2.bin, for example. Note that gptq models can still run on AMD GPUs, contrary to what the comment I linked to says.

If you're loading a gptq model and running into the llama_inference_offload error, you might be missing the GPTQ-for-LLaMA repo. To fix the error, you'll have to clone that repo. See this comment. Don't forget to install that repo's requirements.

For reference, I'm able to run "gpt-x-alpaca-13b-native-4bit-128g" on an AMD GPU. Granted, I'm on Linux.

Also, did you happen to use the one-click installer by any chance?

jllllll commented 1 year ago

There's a few things going on here I think.

You mentioned ggml but I think your log shows a gptq model. See this comment. ggml models have file names that end with .ggml.q4_2.bin, for example. Note that gptq models can still run on AMD GPUs, contrary to what the comment I linked to says.

If you're loading a gptq model and running into the llama_inference_offload error, you might be missing the GPTQ-for-LLaMA repo. To fix the error, you'll have to clone that repo. See this comment. Don't forget to install that repo's requirements.

For reference, I'm able to run "gpt-x-alpaca-13b-native-4bit-128g" on an AMD GPU. Granted, I'm on Linux.

Also, did you happen to use the one-click installer by any chance?

While it is possible to set up GPTQ for AMD GPUs, it requires ROCm along with a ROCm version of Pytorch, both of which are Linux only. On top of that, it doesn't work for everyone and only works for some AMD GPUs.

@Ghee36 I think the easiest option for you would be to use koboldcpp. It is made specifically for GGML models, which are cpu only.

z3Nsk1Fh5a commented 1 year ago

There's a few things going on here I think. You mentioned ggml but I think your log shows a gptq model. See this comment. ggml models have file names that end with .ggml.q4_2.bin, for example. Note that gptq models can still run on AMD GPUs, contrary to what the comment I linked to says. If you're loading a gptq model and running into the llama_inference_offload error, you might be missing the GPTQ-for-LLaMA repo. To fix the error, you'll have to clone that repo. See this comment. Don't forget to install that repo's requirements. For reference, I'm able to run "gpt-x-alpaca-13b-native-4bit-128g" on an AMD GPU. Granted, I'm on Linux. Also, did you happen to use the one-click installer by any chance?

While it is possible to set up GPTQ for AMD GPUs, it requires ROCm along with a ROCm version of Pytorch, both of which are Linux only. On top of that, it doesn't work for everyone and only works for some AMD GPUs.

@Ghee36 I think the easiest option for you would be to use koboldcpp. It is made specifically for GGML models, which are cpu only.

You are right.

Also I didn’t use the 1 click installer so that may be another contributor Indeed. I’ll be checking this out asap, thank you kindly for sharing this suggestion.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.