kiri-art / docker-diffusers-api

Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers.
https://kiri.art/
MIT License
202 stars 94 forks source link

Apple M1 / M2 / MPS support #20

Open ormedo opened 1 year ago

ormedo commented 1 year ago

Hi!

I just downladed de proyect and try to build and deploy the docker on my M1. I always get the same error. [+] Building 163.5s (15/44)
=> [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 6.90kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s [+] Building 163.6s (15/44)
=> => transferring context: 64.22kB 0.0s => CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s => [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s => [base 3/5] RUN apt-get update 14.3s => [base 4/5] RUN apt-get install -yqq git 27.6s [+] Building 1320.8s (18/44)
=> [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 6.90kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s => [internal] load build context 0.0s => => transferring context: 64.22kB 0.0s => CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s => [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s => [base 3/5] RUN apt-get update 14.3s => [base 4/5] RUN apt-get install -yqq git 27.6s => [base 5/5] RUN apt-get install -yqq zstd 8.3s => [output 1/32] RUN mkdir /api 0.5s => [patchmatch 1/3] WORKDIR /tmp 0.0s => [patchmatch 2/3] COPY scripts/patchmatch-setup.sh . 0.0s => [patchmatch 3/3] RUN sh patchmatch-setup.sh 0.4s => [output 2/32] WORKDIR /api 0.0s => [output 3/32] RUN conda update -n base -c defaults conda 101.1s => [output 4/32] RUN conda create -n xformers python=3.10 33.9s => [output 5/32] RUN python --version 6.3s => ERROR [output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1 1126.9s

[output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1:

14 9.049 Collecting package metadata (current_repodata.json): ...working... done

14 85.41 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

14 85.44 Collecting package metadata (repodata.json): ...working... done

14 489.9 Solving environment: ...working... done

14 619.6

14 619.6 ## Package Plan

14 619.6

14 619.6 environment location: /opt/conda/envs/xformers

14 619.6

14 619.6 added / updated specs:

14 619.6 - cudatoolkit=11.6

14 619.6 - pytorch=1.12.1

14 619.6

14 619.6

14 619.6 The following packages will be downloaded:

14 619.6

14 619.6 package | build

14 619.6 ---------------------------|-----------------

14 619.6 blas-1.0 | mkl 6 KB

14 619.6 ca-certificates-2022.12.7 | ha878542_0 143 KB conda-forge

14 619.6 certifi-2022.12.7 | pyhd8ed1ab_0 147 KB conda-forge

14 619.6 cudatoolkit-11.6.0 | hecad31d_10 821.2 MB conda-forge

14 619.6 intel-openmp-2022.1.0 | h9e868ea_3769 4.5 MB

14 619.6 mkl-2022.1.0 | hc2b9512_224 129.7 MB

14 619.6 pytorch-1.12.1 |py3.10_cuda11.6_cudnn8.3.2_0 1.20 GB pytorch

14 619.6 pytorch-mutex-1.0 | cuda 3 KB pytorch

14 619.6 typing_extensions-4.4.0 | pyha770c72_0 29 KB conda-forge

14 619.6 ------------------------------------------------------------

14 619.6 Total: 2.13 GB

14 619.6

14 619.6 The following NEW packages will be INSTALLED:

14 619.6

14 619.6 blas pkgs/main/linux-64::blas-1.0-mkl

14 619.6 cudatoolkit conda-forge/linux-64::cudatoolkit-11.6.0-hecad31d_10

14 619.6 intel-openmp pkgs/main/linux-64::intel-openmp-2022.1.0-h9e868ea_3769

14 619.6 mkl pkgs/main/linux-64::mkl-2022.1.0-hc2b9512_224

14 619.6 pytorch pytorch/linux-64::pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0

14 619.6 pytorch-mutex pytorch/noarch::pytorch-mutex-1.0-cuda

14 619.6 typing_extensions conda-forge/noarch::typing_extensions-4.4.0-pyha770c72_0

14 619.6

14 619.6 The following packages will be UPDATED:

14 619.6

14 619.6 ca-certificates pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-ha878542_0

14 619.6 certifi pkgs/main/linux-64::certifi-2022.9.24~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0

14 619.6

14 619.6

14 619.6 Proceed ([y]/n)?

14 619.6

14 619.6 Downloading and Extracting Packages

14 1110.5 CondaError: Downloaded bytes did not match Content-Length

14 1110.5 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2

14 1110.5 target_path: /opt/conda/pkgs/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2

14 1110.5 Content-Length: 1284916176

14 1110.5 downloaded bytes: 1100035059

14 1110.5

14 1110.5

14 1110.5

14 1126.1 ERROR conda.cli.main_run:execute(47): conda run /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1 failed. (See above for error)


executor failed running [/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1]: exit code: 1

I understand that it's a download problem, but I'm not good at docker to be able to fix this problem.

Any suggestions?

gadicc commented 1 year ago

Hey!

I would have said just rerun the build command and it will retry again from the last successful (and cached) step, but as you say, you keep getting the same error.

Is it always on the same file, and the same number of bytes? Are you using a proxy?

ormedo commented 1 year ago

HI!

Thanks your your time supporting us. I think cloud be a bad file or networking issue. Upgrade to 11.7 version fix the problem

ormedo commented 1 year ago

Hi! Just another conflict.

In this case with the python version.

15 2074.9

15 2074.9 UnsatisfiableError: The following specifications were found

15 2074.9 to be incompatible with the existing python installation in your environment:

15 2074.9

15 2074.9 Specifications:

15 2074.9

15 2074.9 - six -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.5,<3.6.0a0']

15 2074.9 - wheel -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']

15 2074.9 - xformers -> python[version='>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']

15 2074.9

15 2074.9 Your python: python=3.10

gadicc commented 1 year ago

Hey! Unfortunately xformers only has precompiled binaries for a very select list of package version combinations (I have some notes about this at the top of the Dockerfile). 11.7 won't work. You could try 11.3 though.

P.S. I don't know much about running on diffusers on an M1 beyond that it's possible. You may well need to search docker-diffusers-api codebase for anywhere I've written cuda and replace it with mps. I'll try fix this in a future release so this won't be necessary (you're the first person to try this :sweat_smile:) Please do report on your findings, would love to get this working for all M1 users!

ormedo commented 1 year ago

Its works con M1 with 11.3 :D but exited after a few seconds with no visible logs, at last with my knowledge :S

ormedo commented 1 year ago

Captura de Pantalla 2022-12-09 a las 23 12 11

ormedo commented 1 year ago

There are the logs inside container.

Traceback (most recent call last): File "/api/server.py", line 12, in user_src.init() File "/api/app.py", line 53, in init "device": torch.cuda.get_device_name(), File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 329, in get_device_name return get_device_properties(device).name File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 359, in get_device_properties _lazy_init() # will define _get_device_properties File "/opt/conda/envs/xformers/lib/python3.10/site-packages/torch/cuda/init.py", line 211, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ERROR conda.cli.main_run:execute(47): conda run /bin/bash -c python3 -u server.py failed. (See above for error) libc10_cuda.so: cannot open shared object file: No such file or directory WARNING: libc10_cuda.so: cannot open shared object file: No such file or directory Need to compile C++ extensions to get sparse attention support. Please run python setup.py build develop

gadicc commented 1 year ago

Hey, thanks! Logs make it much easier to understand what's going on.

So yeah, as I suspected, unfortunately we're going to have to look for any code that references nvidia's cuda and remove it if it's not needed, or replace it with mps where possible, to work on Apple M1.

I would really love to make docker-diffusers-api work out the box with M1, but it's going to be quite a while until I'll have the time to be actively involved here :(

In the meantime, the line in question can be removed entirely (app.py line 53: device: torch.cuda...). And you'll need to search through all the files for any other mention of "cuda" and replace it with "mps" (especially anything like .to("cuda"), device="cuda", or anything like that).

Again, I wish I could help more, and look into automatically detecting the right GPU, but I just don't have time at the moment, and really am not sure when I will :( But please keep this issue open, please keep us updated with your progress, and I will take a more active role here when I can. I'll also be available to answer questions to the best of my ability (but I really have zero experience with Apple, unfortunately).

gadicc commented 1 year ago

And for future reference:

ormedo commented 1 year ago

I Understand. I want to test before go on production in Banana's enviroment. But 1 Click installation goes sweet!

gadicc commented 1 year ago

Oh, awesome! That's great. Thanks for reporting back about that.. at least you can still play in the meantime :)

I should have a chance to look at this next week... if we're lucky, it will all just work afterwards. Otherwise it will take a lot longer :sweat_smile: Do you know any good places to rent M1's online? I think one of the companies I've used before has them, I'll try to remember :sweat_smile:

ormedo commented 1 year ago

AWS allow M1 Mac Mini instances if I remember well

gadicc commented 1 year ago

Oh great, thanks!

More future ref stuff for me...

https://pytorch.org/docs/stable/notes/mps.html

https://chrisdare.medium.com/running-pytorch-on-apple-silicon-m1-gpus-a8bb6f680b02