Linux Installer doesn't support AMD?

prgarnett commented 1 year ago

I tried the linux installer and it stops at the point the type of GPU is selected. The Readme suggests that AMD is not supported on Windows but should it work with AMD in linux? Or is manual installation the only option?

jllllll commented 1 year ago

I've been looking into doing this for a while now. The main hangup has been that I don't own an AMD GPU to test with.

Another issue is one that the KoboldAI devs encountered: system compatibility. The 4bit fork of KoboldAI attempted to distribute GPTQ-for-LLaMa AMD builds, but was never able to get them to work on systems other than the one that compiled it. This means that having the installer compile all the software is likely the only viable option.

Overall, some basic AMD support like installing the ROCm version of Pytorch and setting up exllama is possible. However, there likely won't be more than that without more AMD GPU support from developers.

It's worth noting that the CPU-only option at least provides a base installation to build off of. A completely manual installation is not truly necessary. This is the current state of what is needed for AMD GPUs installing using the CPU-only option:

~Install ROCm version of Pytorch~
~Uninstall ExLlama module and clone ExLlama repo~
It may be possible to compile AutoGPTQ Cuda kernels with ROCm, not sure. Triton support is already available.
~Clone GPTQ-for-LLaMa repo and compile it's kernel with ROCm~
Compile and install llama-cpp-python with CLBlast support

Lines crossed out indicate what has been added to the installer mentioned below.

jllllll commented 1 year ago

I have added initial AMD GPU support to the installer here: https://github.com/jllllll/one-click-installers/tree/cross-platform-amd It requires ROCm SDK 5.4.2.

Can you test it and let me know how it works? I have added some instructions to the INSTRUCTIONS.txt file as well. Any suggestions are appreciated.

nibarulabs commented 1 year ago

https://github.com/jllllll/one-click-installers/tree/cross-platform-amd

I just did a fresh run script on Ubuntu Linux 21.10. I'm compiling against a Radeon RX 550 4GB. Machine has Ryzen 9 5900X 12-core, 64 GB ram.

I got the following error when selecting the AMD gpu option:

In file included from /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/include/c10/hip/HIPGraphsC10Utils.h:4,
                       from /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/include/c10/hip/HIPCachingAllocator.h:5,
                       from /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/include/c10/hip/impl/HIPGuardImpl.h:10,
                       from /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/include/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:6,
                       from /home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_hip.cpp:4:
      /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/include/c10/hip/HIPStream.h:7:10: fatal error: hip/hip_runtime_api.h: No such file or directory
          7 | #include <hip/hip_runtime_api.h>
            |          ^~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
          subprocess.run(
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/setup.py", line 4, in <module>
          setup(
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run
          self.run_command("build")
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
          build_ext.build_extensions(self)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
          objects = self.compiler.compile(
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for quant-cuda
  Running setup.py clean for quant-cuda
Failed to build quant-cuda
ERROR: Could not build wheels for quant-cuda, which is required to install pyproject.toml-based projects
ERROR: GPTQ CUDA kernel compilation failed.
You will not be able to use GPTQ-based models with GPTQ-for-LLaMa.

Hope it helps. I'll be around for a bit if there's more to try/test. 👍

jllllll commented 1 year ago

@nibarulabs I pushed an update to use a ROCm-compatible fork of GPTQ-for-LLaMa. After updating the installer, delete the text-generation-webui/repositories/GPTQ-for-LLaMa folder and run the update script.

Installing GPTQ-for-LLaMa is the last thing the installer does, so I'm glad that is seemingly all that needs fixing. It isn't even truly needed as exllama supports ROCm anyway.

nibarulabs commented 1 year ago

@nibarulabs I pushed an update to use a ROCm-compatible fork of GPTQ-for-LLaMa. After updating the installer, delete the text-generation-webui/repositories/GPTQ-for-LLaMa folder and run the update script.

Installing GPTQ-for-LLaMa is the last thing the installer does, so I'm glad that is seemingly all that needs fixing. It isn't even truly needed as exllama supports ROCm anyway.

@jllllll

Ok, I deleted that folder, pulled that branch, and then re-ran start_linux.sh (<- is that the update script?)

This is the output now:

bin /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
2023-07-03 00:26:06 INFO:Loading the extension "gallery"...
Running on local URL:  http://127.0.0.1:7860

Does that look ok? And also, is there a place somewhere in the code where I can change the host to 0.0.0.0? I am trying to run this on another machine on my network. Thanks.

jllllll commented 1 year ago

The update script is update_linux.sh

You can use the --listen flag to allow access to the local network. Add it in webui.py.

nibarulabs commented 1 year ago

Ok, updating and I will add the listen flag. I'll let you know how it goes.

nibarulabs commented 1 year ago

Also, I did not pre-install ROCm 5.4.2 - I was assuming this installer did it for me. I that a correct assumption?

jllllll commented 1 year ago

Also, I did not pre-install ROCm 5.4.2 - I was assuming this installer did it for me. I that a correct assumption?

It doesn't. Doing so is technically possible, but would require root permissions and would be overly complicated to prevent conflict with existing installations.

nibarulabs commented 1 year ago

Ok, I think that's the missing part for me. I'll get that setup.

nibarulabs commented 1 year ago

@jllllll Got it running. I had problems running the update script after I installed ROCm-5.4.2. This ROCm issue reply

https://github.com/RadeonOpenCompute/ROCm/issues/1843#issuecomment-1339619859

is what did the trick. After I installed that package, I was able to go through the update and can now start everything.

I'm downloading a model now and hoping everything works well. Thanks!

Oh, one other thing; when I start, I see these messages on the console:

bin /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
2023-07-03 15:42:22 INFO:Loading the extension "gallery"...
Running on local URL:  http://0.0.0.0:7860

Not sure if that's a deal breaker or not. I guess I'll find out..

And, I have a larger gpu card - a RX5500XT 8G - I'm going to pop in later if this all works. I'm still new to running all this, so I'm not sure what difference it will make, if any. Just thought I'd mention it.

jllllll commented 1 year ago

Those messages are normal as bitsandbytes does not have an up-to-date AMD build, so one isn't included.

nibarulabs commented 1 year ago

Ok, I loaded a model and I'm not sure what the error is yet. Here's the startup message now:

bin /home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
2023-07-03 16:17:51 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ...
2023-07-03 16:17:51 INFO:The AutoGPTQ params are: {'model_basename': 'Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True}
2023-07-03 16:17:51 WARNING:CUDA extension not installed.
2023-07-03 16:17:52 WARNING:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
2023-07-03 16:17:52 WARNING:The safetensors archive passed at models/TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ/Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Traceback (most recent call last):
  File "/home/aiprojs/one-click-installers/text-generation-webui/server.py", line 1057, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 74, in load_model
    output = load_func_map[loader](model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 280, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/AutoGPTQ_loader.py", line 56, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 82, in from_quantized
    return quant_func(
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 773, in from_quantized
    accelerate.utils.modeling.load_checkpoint_in_model(
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1094, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 946, in load_state_dict
    return safe_load_file(checkpoint_file, device=list(device_map.values())[0])
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/safetensors/torch.py", line 261, in load_file
    result[k] = f.get_tensor(k)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Maybe it's the RX 550 card? Should I blow away text-generation-webui/repositories/GPTQ-for-LLaMa and re-run the update with the RX5500XT card?

jllllll commented 1 year ago

@nibarulabs Try un-commenting one or both of these lines at the top of webui.py:

os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1030'

May only need the first one.

nibarulabs commented 1 year ago

Yah, I think I already have those uncommented:

# Remove the '# ' from the following lines if needed for your AMD GPU on Linux
os.environ["ROCM_PATH"] = '/opt/rocm'
os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'
os.environ["HCC_AMDGPU_TARGET"] = 'gfx1030'

jllllll commented 1 year ago

Maybe use the GPTQ-for-LLaMa or exllama loader? I don't think AutoGPTQ is going to work for AMD GPU.

nibarulabs commented 1 year ago

Hmm, I'm not sure what that entails. Does that mean delete the model I downloaded and then start everything again and look for those settings in the webui?

nibarulabs commented 1 year ago

I also found this post on an unrelated project:

https://github.com/comfyanonymous/ComfyUI/issues/650#issuecomment-1556018389

Not sure if it fixes the no hip gpu's error, but they're using the HSA_OVERRIDE_GFX_VERSION export and some other things possible related..

jllllll commented 1 year ago

You can select a loader in the model tab of the webui or you can use the --loader flag.

--loader gptq-for-llama
--loader exllama

If you need the webui to not load a model on startup, you can use: --model None

nibarulabs commented 1 year ago

Using the --loader gptq-for-llama flag:

2023-07-03 16:44:40 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ...
2023-07-03 16:44:40 INFO:Found the following quantized model: models/TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ/Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order.safetensors
Traceback (most recent call last):
  File "/home/aiprojs/one-click-installers/text-generation-webui/server.py", line 1057, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 74, in load_model
    output = load_func_map[loader](model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 272, in GPTQ_loader
    model = modules.GPTQ_loader.load_quantized(model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/GPTQ_loader.py", line 177, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/GPTQ_loader.py", line 77, in _load_quant
    make_quant(**make_quant_kwargs)
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 446, in make_quant
    make_quant(child, names, bits, groupsize, faster, name + '.' + name1 if name != '' else name1, kernel_switch_threshold=kernel_switch_threshold)
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 446, in make_quant
    make_quant(child, names, bits, groupsize, faster, name + '.' + name1 if name != '' else name1, kernel_switch_threshold=kernel_switch_threshold)
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 446, in make_quant
    make_quant(child, names, bits, groupsize, faster, name + '.' + name1 if name != '' else name1, kernel_switch_threshold=kernel_switch_threshold)
  [Previous line repeated 1 more time]
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 443, in make_quant
    module, attr, QuantLinear(bits, groupsize, tmp.in_features, tmp.out_features, faster=faster, kernel_switch_threshold=kernel_switch_threshold)
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 142, in __init__
    raise NotImplementedError("Only 2,3,4,8 bits are supported.")
NotImplementedError: Only 2,3,4,8 bits are supported.

Using the --loader exllama flag:

2023-07-03 16:45:26 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ...
2023-07-03 16:45:26 WARNING:Exllama module failed to load. Will attempt to load from repositories.
No CUDA runtime is found, using CUDA_HOME='/home/aiprojs/one-click-installers/installer_files/env'
Successfully preprocessed all matching files.
Traceback (most recent call last):
  File "/home/aiprojs/one-click-installers/text-generation-webui/server.py", line 1057, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 74, in load_model
    output = load_func_map[loader](model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 286, in ExLlama_loader
    model, tokenizer = ExllamaModel.from_pretrained(model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/exllama.py", line 63, in from_pretrained
    model = ExLlama(config)
  File "/home/aiprojs/one-click-installers/text-generation-webui/repositories/exllama/model.py", line 722, in __init__
    tensor = tensor.to(device, non_blocking = True)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Time to give up? 😄

jllllll commented 1 year ago

Not too sure how to fix the no HIP GPUs error. For the first one, try: --loader gptq-for-llama --wbits 4

nibarulabs commented 1 year ago

2023-07-03 16:55:14 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ...
2023-07-03 16:55:14 INFO:Found the following quantized model: models/TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ/Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order.safetensors
Traceback (most recent call last):
  File "/home/aiprojs/one-click-installers/text-generation-webui/server.py", line 1057, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 74, in load_model
    output = load_func_map[loader](model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/models.py", line 272, in GPTQ_loader
    model = modules.GPTQ_loader.load_quantized(model_name)
  File "/home/aiprojs/one-click-installers/text-generation-webui/modules/GPTQ_loader.py", line 199, in load_quantized
    model = model.to(torch.device('cuda:0'))
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1902, in to
    return super().to(*args, **kwargs)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/aiprojs/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

So, it goes back to no hip gpus found. I'm gonna dig into that comfyui post to see if that will help at all.

If I still can't get it to work, then that will mean I couldn't get the amd gpu parts working and will have to use the cpu only mode, correct? And with that, I won't be able to use any GPTQ models, correct?

jllllll commented 1 year ago

Is it possible that you didn't install the ROCm drivers and only the SDK?

nibarulabs commented 1 year ago

Yah, it could be possible. These are the instructions I followed:

https://docs.amd.com/en/docs-5.4.2/deploy/linux/os-native/install.html

jllllll commented 1 year ago

The amdgpu-dkms part is the drivers you need. Make sure you did every step on that page.

And run dkms status to check the drivers.

nibarulabs commented 1 year ago

Yah, it appears I have all that in..

ai:~$ dkms status
amdgpu/5.18.13-1528701.22.04, 5.15.0-76-generic, x86_64: installed

I'm gonna poke around a bit more and see if I can find more about the hip gpu error. Is there a reason you recommended rocm 5.4.2 over 5.4.3?

jllllll commented 1 year ago

5.4.2 is the version that Torch uses. If 5.4.3 works better, than I'll change the instructions to mention it.

nibarulabs commented 1 year ago

5.4.3 segfaults, of course - prob due to gptq being compiled against 5.4.2. I deleted the GPTQ-for-LLaMa dir and re-ran the update.

...
Found existing installation: exllama 0.0.4+cu117
Uninstalling exllama-0.0.4+cu117:
  Successfully uninstalled exllama-0.0.4+cu117
Cloning into 'GPTQ-for-LLaMa'...
remote: Enumerating objects: 868, done.
remote: Counting objects: 100% (868/868), done.
remote: Compressing objects: 100% (344/344), done.
remote: Total 868 (delta 530), reused 840 (delta 518), pack-reused 0
Receiving objects: 100% (868/868), 498.21 KiB | 476.00 KiB/s, done.
Resolving deltas: 100% (530/530), done.
Already up to date.
Processing /home/aiprojs/one-click-installers/text-generation-webui/repositories/GPTQ-for-LLaMa
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: -11
  ╰─> [0 lines of output]
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Done!

I think at this point, my system is probably not stable now, sigh. I should prob uninstall both 5.4.2 and 5.4.3 and re-install. I'm going to blow away that machine and try to get a clean environment. Probably start with 5.4.3 (?) only because that post on comfyui got everything to work.

jllllll commented 1 year ago

Searching around, people seem to be having issues with the RX 500 series cards.

nibarulabs commented 1 year ago

Ok, I'll pop in the bigger card and see if that makes a diff.

nibarulabs commented 1 year ago

I wasn't able to get the big card to work either. Probably going to pause on this for now. I'll check back in a bit later, maybe someone can make some better progress.

prgarnett commented 1 year ago

I seem to be able to get models to load however I get these errors instead:

Traceback (most recent call last):
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/gradio/routes.py", line 427, in run_predict
    output = await app.get_blocks().process_api(
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/mnt/raid1/philip/AI_Testing/one-click-installers-cross-platform-amd/text-generation-webui/modules/models_settings.py", line 77, in update_model_parameters
    if i > 0:

This seems to impact the functionallity of the web interface.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

oobabooga / one-click-installers

Linux Installer doesn't support AMD? #94