LLaMA-13B on AMD GPUs - Githubissues

Titaniumtown commented 1 year ago

I have a 6900xt and I tried to load the LLaMA-13B model, I ended up getting this error:

Traceback (most recent call last):
  File "server.py", line 188, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/var/home/riley/text-generation-webui/modules/models.py", line 122, in load_model
    model = eval(command)
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2630, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2953, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "/opt/conda/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 196, in to
    return self.cuda(device)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

going into modules/models.py and setting "load_in_8bit" to False fixed it, but this should work by default.

oobabooga commented 1 year ago

How did you load LLaMA-13B into a 16GB GPU without 8-bit?

Titaniumtown commented 1 year ago

How did you load LLaMA-13B into a 16GB GPU without 8-bit?

using --auto-devices

oobabooga commented 1 year ago

13b/20b models are loaded in 8-bit mode by default (when no flags are specified) because they are too large to fit in consumer GPUs.

--auto-devices disables this default behavior without the need for any manual changes to the code.

Titaniumtown commented 1 year ago

Fixed it, got 8-bit working, had to update bitsandbytes-rocm to use rocm 5.4.0 https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-1 sent in a pull request. https://github.com/broncotc/bitsandbytes-rocm/pull/4

Edit: seems that the 6900xt itself has issues with int8 which this fork (https://github.com/0cc4m/bitsandbytes-rocm/tree/rocm) seems to try and address, but it has it's own issues. Doing some investigation.

Edit 2: relates to this issue (https://github.com/TimDettmers/bitsandbytes/issues/165)

Edit 3: turns out it's something wrong with the generation settings? It only seems to fail when using the "NovelAI Sphinx Moth" preset among others.

oobabooga commented 1 year ago

Nice @Titaniumtown, thanks for the update.

Titaniumtown commented 1 year ago

@oobabooga do you understand anything about what could be causing the generation issues? It seems to only be the case with specific combinations of generation settings.

oobabooga commented 1 year ago

What error appears when you use sphinx moth? This is a preset with high temperature and small top_k and top_p for creative but coherent outputs.

Titaniumtown commented 1 year ago

  0%|                                                    | 0/26 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/routes.py", line 374, in run_predict
    output = await app.get_blocks().process_api(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/blocks.py", line 1017, in process_api
    result = await self.call_function(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/blocks.py", line 849, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/utils.py", line 453, in async_iteration
    return next(iterator)
  File "/var/home/riley/text-generation-webui/modules/text_generation.py", line 188, in generate_reply
    output = eval(f"shared.model.generate({', '.join(generate_params)}){cuda}")[0]
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

oobabooga commented 1 year ago

I get this error when I try to use 8-bit mode in my GTX 1650. It's an upstream issue in the bitsandbytes library, as you found.

Titaniumtown commented 1 year ago

Ah, so there's nothing I can do about it. Sad. Thanks!

Ph0rk0z commented 1 year ago

Change the 8bit threshold. It will probably help on AMD as well. I cannot test because my old card doesn't work with rocm due to AGP 2.0. It only works in windows.

Titaniumtown commented 1 year ago

@Ph0rk0z I just use 4bit models now. Works like a dream and has much better performance.

ttio2tech commented 1 year ago

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

Titaniumtown commented 1 year ago

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

ttio2tech commented 1 year ago

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

Thank you! I will give it a try

atanasopulo commented 1 year ago

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

@Titaniumtown I tried to set things up and run just like the guide explains. I mean, as you said to just run the script like an Nvidia user would. And I get errors about missing headers when running _"python setupcuda.py install": https://github.com/oobabooga/text-generation-webui/issues/487

Could you help me? Am I missing something important? I'm new to all this stuff btw. I'm sure I am not understanding, or missing, something.

Titaniumtown commented 1 year ago

@VivaPeron do you have cuda installed?

viliger2 commented 1 year ago

Getting these errors when trying to to compile GPTQ-for-LLaMA

home/viliger/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_hip_kernel.hip:653:10: error: use of overloaded operator '=' is ambiguous (with operand types 'half2' (aka '__half2') and 'void')
    res2 = {};
/home/viliger/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_hip_kernel.hip:665:12: error: no matching function for call to '__half2float'
    res += __half2float(res2.x) + __half2float(res2.y);

8bit model runs fine, once I got bitsandbytes-rocm installed. Also attached full log of compilation, output.txt

arctic-marmoset commented 1 year ago

@viliger2 @VivaPeron this seems to be caused by GPTQ-for-LLaMA commits after 841feed using fp16 types. HIP doesn't seem to handle some implicit casts as far as I can tell. Rolling back to that commit results in successful compilation.

atanasopulo commented 1 year ago

@VivaPeron do you have cuda installed?

Yes.

atanasopulo commented 1 year ago

@viliger2 @VivaPeron this seems to be caused by GPTQ-for-LLaMA commits after 841feed using fp16 types. HIP doesn't seem to handle some implicit casts as far as I can tell. Rolling back to that commit results in successful compilation.

Thanks a lot! I will try this today when I get hom from work, and let you guys know.

Btw, these are my PC specs: Xeon E5-2620v2, 16GB ECC DDR3 RAM, AMD RX6600 8GB.

arctic-marmoset commented 1 year ago

@viliger2 @VivaPeron I had a chance to take another look the issue and I now have the latest version of GPTQ-for-LLaMA working with HIP. If you're interested, I posted my findings here. I also have a fork of the repo with my changes here.

Titaniumtown commented 1 year ago

@arctic-marmoset Thanks!

atanasopulo commented 1 year ago

@arctic-marmoset wow, thanks a lot!! Will try your fork today! At last I can put my 6600 to do useful work lol

atanasopulo commented 1 year ago

I also have a fork of the repo with my changes here.

I tried your repo and got this error:

No ROCm runtime is found, using ROCM_HOME='/opt/rocm-5.4.3' No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' running install /home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( /home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info/PKG-INFO writing dependency_links to quant_cuda.egg-info/dependency_links.txt writing top-level names to quant_cuda.egg-info/top_level.txt /home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'quant_cuda.egg-info/SOURCES.txt' writing manifest file 'quant_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'quant_cuda' extension gcc -pthread -B /home/christopher/miniconda3/envs/gptq/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/christopher/miniconda3/envs/gptq/include -I/home/christopher/miniconda3/envs/gptq/include -fPIC -O2 -isystem /home/christopher/miniconda3/envs/gptq/include -fPIC -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/TH -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/christopher/miniconda3/envs/gptq/include/python3.9 -c quant_cuda.cpp -o build/temp.linux-x86_64-cpython-39/quant_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 /home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") Traceback (most recent call last): File "/home/christopher/GPTQ-for-LLaMA-fork-amd/GPTQ-for-LLaMa-hip/setup_cuda.py", line 12, in setup( File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run self.do_egg_install() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install self.run_command('bdist_egg') File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run cmd = self.call_command('install_lib', warn_dir=0) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command self.run_command(cmdname) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run self.build() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build self.run_command('build_ext') File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions build_ext.build_extensions(self) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions self._build_extensions_serial() File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial self.build_extension(ext) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension _build_ext.build_extension(self, ext) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension objects = self.compiler.compile( File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_single_compile cflags = unix_cuda_flags(cflags) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 548, in unix_cuda_flags cflags + _get_cuda_arch_flags(cflags)) File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1773, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range

viliger2 commented 1 year ago

@VivaPeron it seems that you have issues with your ROCm installation, check if you have it installed or have version different from 5.4.3

belqit commented 1 year ago

(textgen) root@gribaai:~/text-generation-webui# python server.py --model llama-13b-4bit-128g --wbits 4 --groupsize 128

CUDA SETUP: Loading binary /root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libsbitsandbytes_cpu.so...
Loading llama-13b-4bit-128g...
CUDA extension not installed.
Traceback (most recent call last):
  File "/root/text-generation-webui/server.py", line 276, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/root/text-generation-webui/modules/models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "/root/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/root/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'
(textgen) root@gribaai:~/text-generation-webui#

how do I install CUDA extension with amd gpu?

if i do this "python setup_cuda.py install" inside "GPTQ-for-LLaMa" folder it returns me this error:

............
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1780, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

arctic-marmoset commented 1 year ago

@belqit Please see @viliger2's comment above. You'll need to install ROCm 5.4.3.

belqit commented 1 year ago

@belqit Please see @viliger2's comment above. You'll need to install ROCm 5.4.3.

oobabooga / text-generation-webui

LLaMA-13B on AMD GPUs #166