l1na-forever / stable-diffusion-rocm-docker

Stable Diffusion Docker image preconfigured for usage with AMD Radeon cards
126 stars 22 forks source link

Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check #24

Open ckhung opened 10 months ago

ckhung commented 10 months ago

Hi, my GPU is Sapphire Nitro+ Radeon RX580 and the host OS is linux mint debian edition 5 ("elsie"). I am using this command: docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ~/work:/work --name stb-dif l1naforever/stable-diffusion-rocm:latest and it produces this error message:

Python 3.7.13 (default, Mar 29 2022, 02:18:16) 
[GCC 7.5.0]
Commit hash: 08b3f7aef15f74f4d2254b1274dd66fcc7940348
Traceback (most recent call last):
  File "launch.py", line 168, in <module>
    prepare_enviroment()
  File "launch.py", line 121, in prepare_enviroment
    run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
  File "launch.py", line 56, in run_python
    return run(f'"{python}" -c "{code}"', desc, errdesc)
  File "launch.py", line 32, in run
    raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/opt/conda/bin/python" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 134
stdout: <empty>
stderr: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)

That's strange. I thought pytorch in this container image is suppoed to look for rocm, not nvidia's cuda, right? Thanks in advance for your help.

FlorianHeigl commented 4 months ago

@ckhung had the sme issue, it seems as if it could be a python version issue. I see py3.7 in your output. looking at this comment in particular: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13824#issuecomment-1853738965

I'm trying to version lock the pip modules in Dockerfile and rebuild.

dylanmilesmsu commented 1 month ago

I have the same issue