cannot import name '_compare_version' from 'torchmetrics.utilities.imports

kstenerud commented 1 year ago

Describe the bug

Attempting to run the docker image results in:

ImportError: cannot import name '_compare_version' from 'torchmetrics.utilities.imports' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/imports.py)

To Reproduce Steps to reproduce the behavior:

docker build . -t stable-diffusion-rocm
docker run -it -p 7860:7860 --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /home/karl/stable-diffusion:/pwd -e HSA_OVERRIDE_GFX_VERSION=10.3.0 --name stable-diffusion stable-diffusion-rocm

Expected behavior

It should complete initialization.

Container Output

Building wheels for collected packages: lit
  Building wheel for lit (pyproject.toml) ... done
  Created wheel for lit: filename=lit-16.0.6-py3-none-any.whl size=93605 sha256=860b3739ae0c10b7d5237825a8963745aa4af6aa1a2d2bf6019323fb3356120b
  Stored in directory: /root/.cache/pip/wheels/14/f9/07/bb2308587bc2f57158f905a2325f6a89a2befa7437b2d7e137
Successfully built lit
Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, pytorch-triton-rocm, torch, torchvision
Successfully installed MarkupSafe-2.1.3 certifi-2023.7.22 charset-normalizer-3.2.0 cmake-3.27.2 filelock-3.12.2 idna-3.4 jinja2-3.1.2 lit-16.0.6 mpmath-1.3.0 networkx-3.1 numpy-1.25.2 pillow-10.0.0 pytorch-triton-rocm-2.0.2 requests-2.31.0 sympy-1.12 torch-2.0.1+rocm5.4.2 torchvision-0.15.2+rocm5.4.2 typing-extensions-4.7.1 urllib3-2.0.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into /sd/repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into /sd/repositories/taming-transformers...
Cloning K-diffusion into /sd/repositories/k-diffusion...
Cloning CodeFormer into /sd/repositories/CodeFormer...
Cloning BLIP into /sd/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --port 7860
Traceback (most recent call last):
  File "/sd/launch.py", line 370, in <module>
    start()
  File "/sd/launch.py", line 361, in start
    import webui
  File "/sd/webui.py", line 24, in <module>
    import pytorch_lightning # pytorch_lightning should be imported after torch, but it re-enables warnings on import so import once to disable them
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/__init__.py", line 34, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/__init__.py", line 25, in <module>
    from pytorch_lightning.callbacks.progress import ProgressBarBase, RichProgressBar, TQDMProgressBar
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/progress/__init__.py", line 22, in <module>
    from pytorch_lightning.callbacks.progress.rich_progress import RichProgressBar  # noqa: F401
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/progress/rich_progress.py", line 20, in <module>
    from torchmetrics.utilities.imports import _compare_version
ImportError: cannot import name '_compare_version' from 'torchmetrics.utilities.imports' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/imports.py)

Desktop (please complete the following information):

System RAM & SWAP: 32gb + 0
AMD GPU & VRAM: 6600XT 8GB
OS + Distro and Version: Ubuntu 22.04
Host ROCm Version: 5.6.0

kstenerud commented 1 year ago

So after some digging, it's caused by some inter-package breakage that occurred within the past two weeks. I got it to work by manually fixing up some package versions.

# Preinstall dependencies. This will fail
RUN python -d launch.py --exit --skip-torch-cuda-test || true

# Make fixes
RUN --mount=type=cache,target=/root/.cache/pip \
   pip3 install torchmetrics==0.11.4 && \
   pip3 install gradio>=3.36.1 && \
   pip3 install fastapi==0.95.2 && \
   true

# Preinstall dependencies again
RUN python -d launch.py --exit --skip-torch-cuda-test

Note: You can also move the model downloading part to your Dockerfile:

# Pre-download model
RUN wget -q https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors -O /sd/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors

That gives a docker instance with a first-time launch of less than 30 seconds.

Everything working a-ok now :) Thanks for putting this out there!

kstenerud commented 1 year ago

BTW, the main stable-diffusion-webui branch now works with ROCM if you just make a one-line change: https://github.com/kstenerud/stable-diffusion-webui/commit/a08711d713bfeb2155f084b7d9f9a28ce6f3ac43

Then you can get a fully-functional SD webui like so:

FROM ubuntu:jammy
SHELL ["/bin/bash", "-c"]  
ENV PORT=7860 \
    DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PYTHONIOENCODING=UTF-8 \
    REQS_FILE='requirements.txt' \
    COMMANDLINE_ARGS='' 

WORKDIR /opt
RUN apt-get -y update && \
    apt-get install -y --no-install-recommends libstdc++-12-dev ca-certificates wget gnupg2 gawk curl git libglib2.0-0 apt-utils python3.10-venv python3-pip && \
    wget https://repo.radeon.com/amdgpu-install/5.5/ubuntu/jammy/amdgpu-install_5.5.50500-1_all.deb && \
    apt-get install -y ./amdgpu-install_5.5.50500-1_all.deb && \
    amdgpu-install -y --usecase=rocm --no-dkms && \
    true

RUN git clone -b rocm https://github.com/kstenerud/stable-diffusion-webui.git /sd

WORKDIR /sd

RUN apt-get autoremove -y && \
    apt-get clean -y && \
    rm -rf /var/lib/apt/lists/* && \
    python3 -m venv venv && \
    source venv/bin/activate && \
    ln -s /usr/bin/python3 /usr/bin/python && \
    python3 -m pip install --upgrade pip wheel

# Pre-download model
RUN wget -q https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors -O /sd/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors

# Preinstall dependencies. This will fail.
RUN python -d launch.py --exit --skip-torch-cuda-test || true

# Apply fixes
# pytorch_lightning: No module named 'pytorch_lightning.utilities.distributed'
# torchmetrics:      cannot import name '_compare_version' from 'torchmetrics.utilities.imports
# fastapi:           AttributeError: __config__
RUN --mount=type=cache,target=/root/.cache/pip \
    pip3 install pytorch_lightning==1.7.5 && \
    pip3 install torchmetrics==0.11.4 && \
    pip3 install fastapi==0.95.2 && \
    true

# SD output image format problem: SyntaxError: not a TIFF file (header b"b'Exif\\x" not valid)
# Just run the previous output image through another image editor and it'll work again.

# Preinstall dependencies again
RUN python -d launch.py --exit --skip-torch-cuda-test

EXPOSE ${PORT}

VOLUME [ "/sd/configs","/sd/models", "/sd/outputs","/sd/extensions", "/sd/plugins"]
ENTRYPOINT python -d launch.py --port "${PORT}" --listen || bash

hydrian / stable-diffusion-webui-rocm

cannot import name '_compare_version' from 'torchmetrics.utilities.imports #16