oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.16k stars 5.27k forks source link

Unable to run quantized vicuna: ModuleNotFoundError: No module named 'llama_inference_offload' #794

Closed Tameflame closed 1 year ago

Tameflame commented 1 year ago

Describe the bug

My command: python server.py --model-dir ../models --wbits 4 --groupsize 128 --auto-devices

My model: https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

On a gpu instance running linux

Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
  File "/workspace/ai/text-generation-webui/server.py", line 277, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/workspace/ai/text-generation-webui/modules/models.py", line 100, in load_model
    from modules.GPTQ_loader import load_quantized
  File "/workspace/ai/text-generation-webui/modules/GPTQ_loader.py", line 13, in <module>
    import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'

Is there an existing issue for this?

Reproduction

My commands:

curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
bash Miniconda3.sh
conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio

mkdir models && cd models
git lfs install
git clone https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g
cd ../
git clone https://github.com/oobabooga/text-generation-webui/
cd text-generation-webui
pip install -r requirements.txt
python server.py --model-dir ../models --wbits 4 --groupsize 128 --auto-devices

Screenshot

No response

Logs

(textgen) root@C.6109931:/workspace/ai/text-generation-webui$ python server.py --model-dir ../models --wbits 4 --groupsize 128 --auto-devices

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/miniconda3/envs/textgen did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/vscode-ipc-29188907-1d2f-49a9-8799-73f292ff4153.sock')}
  warn(msg)
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('cl=\\E[H\\E[J'), PosixPath('ei=\\E[4l'), PosixPath('ae=\\E(B'), PosixPath('ct=\\E[3g'), PosixPath('kr=\\EOC'), PosixPath('ac=\\140\\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00'), PosixPath('bt=\\E[Z'), PosixPath('k5=\\E[15~'), PosixPath('km'), PosixPath('cm=\\E[%i%d;%dH'), PosixPath('DL=\\E[%dM'), PosixPath('vb=\\Eg'), PosixPath('AX'), PosixPath('ti=\\E[?1049h'), PosixPath('Km=\\E[M'), PosixPath('co#261'), PosixPath('sc=\\E7'), PosixPath('op=\\E[39;49m'), PosixPath('SC|screen|VT 100/ANSI X3.64 virtual terminal'), PosixPath('k3=\\EOR'), PosixPath('LE=\\E[%dD'), PosixPath('k2=\\EOQ'), PosixPath('nd=\\E[C'), PosixPath('cr=^M'), PosixPath('as=\\E(0'), PosixPath('@1=\\E[1~'), PosixPath('kN=\\E[6~'), PosixPath('me=\\E[m'), PosixPath('DC=\\E[%dP'), PosixPath('ks=\\E[?1h\\E='), PosixPath('im=\\E[4h'), PosixPath('mi'), PosixPath('F1=\\E[23~'), PosixPath('k1=\\EOP'), PosixPath('cd=\\E[J'), PosixPath('pa#64'), PosixPath('k;=\\E[21~'), PosixPath('k7=\\E[18~'), PosixPath('AF=\\E[3%dm'), PosixPath('st=\\EH'), PosixPath('kH=\\E[4~'), PosixPath('xn'), PosixPath('md=\\E[1m'), PosixPath('LP'), PosixPath('vs=\\E[34l'), PosixPath('\\\n\t'), PosixPath('kl=\\EOD'), PosixPath('do=^J'), PosixPath('li#22'), PosixPath('cs=\\E[%i%d;%dr'), PosixPath('mr=\\E[7m'), PosixPath('ve=\\E[34h\\E[?25h'), PosixPath('kd=\\EOB'), PosixPath('se=\\E[23m'), PosixPath('@7=\\E[4~'), PosixPath('bs'), PosixPath('nw=\\EE'), PosixPath('G0'), PosixPath('ho=\\E[H'), PosixPath('k4=\\EOS'), PosixPath('ce=\\E[K'), PosixPath('dc=\\E[P'), PosixPath('ms'), PosixPath('k9=\\E[20~'), PosixPath('so=\\E[3m'), PosixPath('AB=\\E[4%dm'), PosixPath('ke=\\E[?1l\\E>'), PosixPath('mb=\\E[5m'), PosixPath('mh=\\E[2m'), PosixPath('ta=^I'), PosixPath('kb=\x7f'), PosixPath('vi=\\E[?25l'), PosixPath('kh=\\E[1~'), PosixPath('AL=\\E[%dL'), PosixPath('sr=\\EM'), PosixPath('kP=\\E[5~'), PosixPath('am'), PosixPath('al=\\E[L'), PosixPath('rc=\\E8'), PosixPath('up=\\EM'), PosixPath('it#8'), PosixPath('bl=^G'), PosixPath('Co#8'), PosixPath('kI=\\E[2~'), PosixPath('k8=\\E[19~'), PosixPath('is=\\E)0'), PosixPath('le=^H'), PosixPath('rs=\\Ec'), PosixPath('UP=\\E[%dA'), PosixPath('ue=\\E[24m'), PosixPath('dl=\\E[M'), PosixPath('IC=\\E[%d@'), PosixPath('us=\\E[4m'), PosixPath('pt'), PosixPath('kB=\\E[Z'), PosixPath('kD=\\E[3~'), PosixPath('DO=\\E[%dB'), PosixPath('te=\\E[?1049l'), PosixPath('F2=\\E[24~'), PosixPath('RI=\\E[%dC'), PosixPath('k6=\\E[17~'), PosixPath('xv'), PosixPath('k0=\\E[10~'), PosixPath('ku=\\EOA')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
  File "/workspace/ai/text-generation-webui/server.py", line 277, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/workspace/ai/text-generation-webui/modules/models.py", line 100, in load_model
    from modules.GPTQ_loader import load_quantized
  File "/workspace/ai/text-generation-webui/modules/GPTQ_loader.py", line 13, in <module>
    import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'

System Info

sudo lshw
f15b2f104123 description: Computer width: 64 bits capabilities: smp vsyscall32 *-core description: Motherboard physical id: 0 *-memory description: System memory physical id: 0 size: 62GiB *-cpu product: AMD Ryzen 9 7950X 16-Core Processor vendor: Advanced Micro Devices [AMD] physical id: 1 bus info: cpu@0 size: 2986MHz capacity: 4500MHz width: 64 bits capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca flush_l1d cpufreq *-pci:0 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 100 bus info: pci@0000:00:00.0 version: 00 width: 32 bits clock: 33MHz *-generic UNCLAIMED description: IOMMU product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0.2 bus info: pci@0000:00:00.2 version: 00 width: 32 bits clock: 33MHz capabilities: bus_master cap_list configuration: latency=0 *-pci:0 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 1.1 bus info: pci@0000:00:01.1 version: 00 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:27 ioport:f000(size=4096) memory:fb000000-fc0fffff ioport:f000000000(size=34393292800) *-display description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: vga_controller bus_master cap_list rom configuration: driver=nvidia latency=0 resources: iomemory:f00-eff iomemory:f80-f7f irq:112 memory:fb000000-fbffffff memory:f000000000-f7ffffffff memory:f800000000-f801ffffff ioport:f000(size=128) memory:fc000000-fc07ffff *-multimedia description: Audio device product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0.1 bus info: pci@0000:01:00.1 version: a1 width: 32 bits clock: 33MHz capabilities: bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:109 memory:fc080000-fc083fff *-pci:1 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 1.2 bus info: pci@0000:00:01.2 version: 00 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:28 memory:fce00000-fcefffff *-storage description: Non-Volatile memory controller physical id: 0 bus info: pci@0000:02:00.0 version: 03 width: 64 bits clock: 33MHz capabilities: storage nvm_express bus_master cap_list configuration: driver=nvme latency=0 resources: irq:91 memory:fce00000-fce03fff *-nvme0 description: NVMe device product: TS1TMTE110S physical id: 0 logical name: nvme0 version: U0506B0 serial: H611810105 configuration: nqn=nqn.2014.08.org.nvmexpress:1d791d79H611810105 TS1TMTE110S state=live *-namespace description: NVMe namespace physical id: 1 logical name: nvme0n1 *-pci:2 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 2.1 bus info: pci@0000:00:02.1 version: 00 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:29 memory:fc200000-fc8fffff ioport:f820300000(size=1048576) *-pci description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:03:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:32 memory:fc200000-fc8fffff ioport:f820300000(size=1048576) *-pci:0 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:04:00.0 version: 01 width: 64 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: iomemory:1f10-1f0f irq:33 *-pci:1 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 8 bus info: pci@0000:04:08.0 version: 01 width: 64 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: iomemory:1f10-1f0f irq:34 memory:fc200000-fc6fffff ioport:f820300000(size=1048576) *-pci description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:06:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:38 memory:fc200000-fc6fffff ioport:f820300000(size=1048576) *-pci:0 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:07:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:39 *-pci:1 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 4 bus info: pci@0000:07:04.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:40 memory:fc600000-fc6fffff ioport:f820300000(size=1048576) *-network UNCLAIMED description: Network controller product: MEDIATEK Corp. vendor: MEDIATEK Corp. physical id: 0 bus info: pci@0000:09:00.0 version: 00 width: 64 bits clock: 33MHz capabilities: cap_list configuration: latency=0 resources: iomemory:f80-f7f memory:f820300000-f8203fffff memory:fc600000-fc607fff *-pci:2 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 5 bus info: pci@0000:07:05.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:41 memory:fc200000-fc3fffff *-network description: Ethernet controller product: Intel Corporation vendor: Intel Corporation physical id: 0 bus info: pci@0000:0a:00.0 version: 03 width: 32 bits clock: 33MHz capabilities: bus_master cap_list configuration: driver=igc latency=0 resources: irq:36 memory:fc200000-fc2fffff memory:fc300000-fc303fff *-pci:3 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 6 bus info: pci@0000:07:06.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:43 *-pci:4 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 7 bus info: pci@0000:07:07.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:45 *-pci:5 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 8 bus info: pci@0000:07:08.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:46 *-pci:6 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: c bus info: pci@0000:07:0c.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:47 memory:fc500000-fc5fffff *-usb description: USB controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:0e:00.0 version: 01 width: 64 bits clock: 33MHz capabilities: xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:24 memory:fc500000-fc507fff *-pci:7 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: d bus info: pci@0000:07:0d.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:48 memory:fc400000-fc4fffff *-storage description: SATA controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:0f:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: storage ahci_1.0 bus_master cap_list rom configuration: driver=ahci latency=0 resources: irq:98 memory:fc480000-fc4803ff memory:fc400000-fc47ffff *-pci:2 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: c bus info: pci@0000:04:0c.0 version: 01 width: 64 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: iomemory:f00-eff irq:35 memory:fc800000-fc8fffff *-usb description: USB controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:10:00.0 version: 01 width: 64 bits clock: 33MHz capabilities: xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:24 memory:fc800000-fc807fff *-pci:3 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: d bus info: pci@0000:04:0d.0 version: 01 width: 64 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: iomemory:f00-eff irq:37 memory:fc700000-fc7fffff *-storage description: SATA controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:11:00.0 version: 01 width: 32 bits clock: 33MHz capabilities: storage ahci_1.0 bus_master cap_list rom configuration: driver=ahci latency=0 resources: irq:99 memory:fc780000-fc7803ff memory:fc700000-fc77ffff *-pci:3 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 8.1 bus info: pci@0000:00:08.1 version: 00 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:30 ioport:e000(size=4096) memory:fc900000-fccfffff ioport:f810000000(size=270532608) *-display UNCLAIMED description: VGA compatible controller product: Advanced Micro Devices, Inc. [AMD/ATI] vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0 bus info: pci@0000:12:00.0 version: c1 width: 64 bits clock: 33MHz capabilities: vga_controller bus_master cap_list configuration: latency=0 resources: iomemory:f80-f7f iomemory:f80-f7f memory:f810000000-f81fffffff memory:f820000000-f8201fffff ioport:e000(size=256) memory:fcc00000-fcc7ffff *-multimedia description: Audio device product: Advanced Micro Devices, Inc. [AMD/ATI] vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0.1 bus info: pci@0000:12:00.1 version: 00 width: 32 bits clock: 33MHz capabilities: bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:111 memory:fcc80000-fcc83fff *-generic UNCLAIMED description: Encryption controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0.2 bus info: pci@0000:12:00.2 version: 00 width: 32 bits clock: 33MHz capabilities: cap_list configuration: latency=0 resources: memory:fcb00000-fcbfffff memory:fcc84000-fcc85fff *-usb:0 description: USB controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0.3 bus info: pci@0000:12:00.3 version: 00 width: 64 bits clock: 33MHz capabilities: xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:65 memory:fca00000-fcafffff *-usb:1 description: USB controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0.4 bus info: pci@0000:12:00.4 version: 00 width: 64 bits clock: 33MHz capabilities: xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:74 memory:fc900000-fc9fffff *-pci:4 description: PCI bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 8.3 bus info: pci@0000:00:08.3 version: 00 width: 32 bits clock: 33MHz capabilities: pci normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:31 memory:fcd00000-fcdfffff *-usb description: USB controller product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:13:00.0 version: 00 width: 64 bits clock: 33MHz capabilities: xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:24 memory:fcd00000-fcdfffff *-serial description: SMBus product: FCH SMBus Controller vendor: Advanced Micro Devices, Inc. [AMD] physical id: 14 bus info: pci@0000:00:14.0 version: 71 width: 32 bits clock: 66MHz configuration: driver=piix4_smbus latency=0 resources: irq:0 *-isa description: ISA bridge product: FCH LPC Bridge vendor: Advanced Micro Devices, Inc. [AMD] physical id: 14.3 bus info: pci@0000:00:14.3 version: 51 width: 32 bits clock: 66MHz capabilities: isa bus_master configuration: latency=0 *-pci:1 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 101 bus info: pci@0000:00:01.0 version: 00 width: 32 bits clock: 33MHz *-pci:2 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 102 bus info: pci@0000:00:02.0 version: 00 width: 32 bits clock: 33MHz *-pci:3 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 103 bus info: pci@0000:00:03.0 version: 00 width: 32 bits clock: 33MHz *-pci:4 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 104 bus info: pci@0000:00:04.0 version: 00 width: 32 bits clock: 33MHz *-pci:5 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 105 bus info: pci@0000:00:08.0 version: 00 width: 32 bits clock: 33MHz *-pci:6 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 106 bus info: pci@0000:00:18.0 version: 00 width: 32 bits clock: 33MHz *-pci:7 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 107 bus info: pci@0000:00:18.1 version: 00 width: 32 bits clock: 33MHz *-pci:8 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 108 bus info: pci@0000:00:18.2 version: 00 width: 32 bits clock: 33MHz *-pci:9 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 109 bus info: pci@0000:00:18.3 version: 00 width: 32 bits clock: 33MHz *-pci:10 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 10a bus info: pci@0000:00:18.4 version: 00 width: 32 bits clock: 33MHz *-pci:11 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 10b bus info: pci@0000:00:18.5 version: 00 width: 32 bits clock: 33MHz *-pci:12 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 10c bus info: pci@0000:00:18.6 version: 00 width: 32 bits clock: 33MHz *-pci:13 description: Host bridge product: Advanced Micro Devices, Inc. [AMD] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 10d bus info: pci@0000:00:18.7 version: 00 width: 32 bits clock: 33MHz *-network description: Ethernet interface physical id: 1 logical name: eth0 serial: 02:42:ac:11:00:02 size: 10Gbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=veth driverversion=1.0 duplex=full ip=172.17.0.2 link=yes multicast=yes port=twisted pair speed=10Gbit/s
underlines commented 1 year ago

same for me on ubuntu WSL2:

sudo apt-key del 7fa2af80 
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

then:

(textgen) underlines@DESKTOP-Q2DTD09:~/github$ git clone https://github.com/oobabooga/text-generation-webui
(textgen) underlines@DESKTOP-Q2DTD09:~/github$ cd text-generation-webui
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$ pip install -r requirements.txt
(textgen) underlines@DESKTOP-Q2DTD09:~$ cd anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes
(textgen) underlines@DESKTOP-Q2DTD09:~/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes$ cp libbitsandbytes_cuda120.so libbitsandbytes_cpu.so
(textgen) underlines@DESKTOP-Q2DTD09:~/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes$ cd -
(textgen) underlines@DESKTOP-Q2DTD09:~$ conda install cudatoolkit
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$ python server.py --model vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/underlines/anaconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/underlines/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
  File "/home/underlines/github/text-generation-webui/server.py", line 277, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/underlines/github/text-generation-webui/modules/models.py", line 100, in load_model
    from modules.GPTQ_loader import load_quantized
  File "/home/underlines/github/text-generation-webui/modules/GPTQ_loader.py", line 13, in <module>
    import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$

debug:

ModuleNotFoundError: No module named 'llama_inference_offload'
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
(textgen) underlines@DESKTOP-Q2DTD09:~/github/text-generation-webui$ nvidia-smi
Wed Apr  5 19:03:58 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 531.18       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080         On | 00000000:01:00.0  On |                  N/A |
| 30%   36C    P5               27W / 320W|   2854MiB / 10240MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
underlines commented 1 year ago

same for me on ubuntu WSL2:

I reinstalled the webui, and forgot to build GPTQ-for-LLaMa:

sudo apt install build-essential
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

I was always using the download script to get models from hf.

and then python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128 doesn't throw an error. It works!

The gpt4x-alpaca needs renaming for it's folder and it's -cuda.pt file, i renamed the folder and the file to gpt4x-alpaca. For the very good gpt4-x-alpaca 4bit you can do: python server.py --model gpt4x-alpaca --wbits 4 --groupsize 128 --chat

The latest commit on text-generation-webui from 2 hours ago uses only the --chat argument, and depreciates the --cai-chat argument. The UI then has a setting for Mode: cai-chat, chat, instruct. When choosing instruct, you can select a template. One for Aplaca exists. I created one for Vicuna, which is different from Alpaca. Create a new file Vicuna.yaml in text-generation-webui/characters/instruction-following/

name: "### Assistant:"
your_name: "### Human:"
context: "Below is an instruction that describes a task. Write a response that appropriately completes the request."
deetungsten commented 1 year ago

@underlines:

Can you go into more details how you got gpt4x-alpaca to work? I tried following your suggestion and renamed both the folder and .pt file to gpt4x-alpaca. But when I run your command, I get the error that it cannot find the quantized model in the folder.

Your Vicuna template isn't working for me either. If I ask a question, it answers it correctly but then rambles on with some weird hallucinated Human and assistant conversations.

3dluvr commented 1 year ago

@deetungsten

This is my setup, I did not rename the model file.

folder name:

models/gpt4-x-alpaca-13b-native-4bit-128g/

files in that folder:

config.json gpt-x-alpaca-13b-native-4bit-128g-cuda.pt tokenizer.model

starting up with:

python server.py --gpu-memory 24 --model gpt4-x-alpaca-13b-native-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --chat

Obviously you need to have proper modules installed because transformers are an issue for some models, and so is GPTQ-for-LLaMa extension.

deetungsten commented 1 year ago

@3dluvr

Thanks! A mix of both your advice and @underlines worked for me! I was missing --model_type LLaMA flag. After doing a git pull, renaming the *-cuda.pt file to just .pt, adding the flag, it worked! I find Vicuna work better for coding related asks though. But still getting the ###Human ... ### Assistant loop after it responds correctly

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.