runpod-workers / worker-a1111

Automatic1111 serverless worker.
MIT License
75 stars 109 forks source link

Caching step is broken during image build #4

Closed SlavenIvanov closed 1 year ago

SlavenIvanov commented 1 year ago

With a fresh clone running:

DOCKER_BUILDKIT=1  docker build . --tag my-acc/my-image --platform=linux/amd64

Fails in the hashing step of the Dockerfile:

COPY builder/cache.py /stable-diffusion-webui/cache.py
RUN cd /stable-diffusion-webui && python cache.py --use-cpu=all --ckpt /model.safetensors
# ^ this

System: M1 OSX Docker: version 24.0.2, build cb74dfc

Here is the output:

[+] Building 50.0s (27/30)                                                                                                                                                                                 
 => [internal] load metadata for docker.io/alpine/git:2.36.2                                                                                                                                          2.1s
 => [internal] load metadata for docker.io/library/python:3.10.9-slim                                                                                                                                 2.6s
 => [auth] alpine/git:pull token for registry-1.docker.io                                                                                                                                             0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                         0.0s
[+] Building 50.1s (27/30)                                                                                                                                                                                 
 => => transferring context: 431B                                                                                                                                                                     0.0s
 => [stage-1  1/16] FROM docker.io/library/python:3.10.9-slim@sha256:76dd18d90a3d8710e091734bf2c9dd686d68747a51908db1e1f41e9a5ed4e2c5                                                                 0.0s
 => CACHED [stage-1  2/16] RUN apt-get update &&     apt install -y     fonts-dejavu-core rsync git jq moreutils aria2 wget libgoogle-perftools-dev procps &&     apt-get autoremove -y && rm -rf /v  0.0s
 => CACHED [stage-1  3/16] RUN --mount=type=cache,target=/cache --mount=type=cache,target=/root/.cache/pip     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl  0.0s
 => CACHED [stage-1  4/16] RUN --mount=type=cache,target=/root/.cache/pip     git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git &&     cd stable-diffusion-webui &&     git rese  0.0s
 => CACHED [download 2/7] COPY builder/clone.sh /clone.sh                                                                                                                                             0.0s
 => CACHED [download 3/7] RUN . /clone.sh taming-transformers https://github.com/CompVis/taming-transformers.git 24268930bf1dce879235a7fddd0b2355b84d7ea6 &&     rm -rf data assets **/*.ipynb        0.0s
 => CACHED [download 4/7] RUN . /clone.sh stable-diffusion-stability-ai https://github.com/Stability-AI/stablediffusion.git 47b6b607fdd31875c9279cd2f4f16b92e4ea958e &&     rm -rf assets data/**/*.  0.0s
 => CACHED [download 5/7] RUN . /clone.sh CodeFormer https://github.com/sczhou/CodeFormer.git c5b4593074ba6214284d6acd5f1719b6c5d739af &&     rm -rf assets inputs                                    0.0s
 => CACHED [download 6/7] RUN . /clone.sh BLIP https://github.com/salesforce/BLIP.git 48211a1594f1321b00f14c9f7a5b4813144b2fb9 &&     . /clone.sh k-diffusion https://github.com/crowsonkb/k-diffusi  0.0s
[+] Building 50.2s (27/30)                                                                                                                                                                                 
 => CACHED [download 3/7] RUN . /clone.sh taming-transformers https://github.com/CompVis/taming-transformers.git 24268930bf1dce879235a7fddd0b2355b84d7ea6 &&     rm -rf data assets **/*.ipynb        0.0s
 => CACHED [download 4/7] RUN . /clone.sh stable-diffusion-stability-ai https://github.com/Stability-AI/stablediffusion.git 47b6b607fdd31875c9279cd2f4f16b92e4ea958e &&     rm -rf assets data/**/*.  0.0s
 => CACHED [download 5/7] RUN . /clone.sh CodeFormer https://github.com/sczhou/CodeFormer.git c5b4593074ba6214284d6acd5f1719b6c5d739af &&     rm -rf assets inputs                                    0.0s
 => CACHED [download 6/7] RUN . /clone.sh BLIP https://github.com/salesforce/BLIP.git 48211a1594f1321b00f14c9f7a5b4813144b2fb9 &&     . /clone.sh k-diffusion https://github.com/crowsonkb/k-diffusi  0.0s
 => CACHED [download 7/7] RUN wget -O /model.safetensors https://civitai.com/api/download/models/15236                                                                                                0.0s
[+] Building 368.0s (28/30)                                                                                                                                                                                
 => [internal] load .dockerignore                                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                       0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.0s
 => => transferring dockerfile: 3.65kB                                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/alpine/git:2.36.2                                                                                                                                          2.1s
 => [internal] load metadata for docker.io/library/python:3.10.9-slim                                                                                                                                 2.6s
 => [auth] alpine/git:pull token for registry-1.docker.io                                                                                                                                             0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                         0.0s
 => [download 1/7] FROM docker.io/alpine/git:2.36.2@sha256:ec491c893597b68c92b88023827faa771772cfd5e106b76c713fa5e1c75dea84                                                                           0.0s
 => [internal] load build context                                                                                                                                                                     0.0s 
 => => transferring context: 431B                                                                                                                                                                     0.0s 
 => [stage-1  1/16] FROM docker.io/library/python:3.10.9-slim@sha256:76dd18d90a3d8710e091734bf2c9dd686d68747a51908db1e1f41e9a5ed4e2c5                                                                 0.0s 
 => CACHED [stage-1  2/16] RUN apt-get update &&     apt install -y     fonts-dejavu-core rsync git jq moreutils aria2 wget libgoogle-perftools-dev procps &&     apt-get autoremove -y && rm -rf /v  0.0s 
 => CACHED [stage-1  3/16] RUN --mount=type=cache,target=/cache --mount=type=cache,target=/root/.cache/pip     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl  0.0s 
 => CACHED [stage-1  4/16] RUN --mount=type=cache,target=/root/.cache/pip     git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git &&     cd stable-diffusion-webui &&     git rese  0.0s 
 => CACHED [download 2/7] COPY builder/clone.sh /clone.sh                                                                                                                                             0.0s 
 => CACHED [download 3/7] RUN . /clone.sh taming-transformers https://github.com/CompVis/taming-transformers.git 24268930bf1dce879235a7fddd0b2355b84d7ea6 &&     rm -rf data assets **/*.ipynb        0.0s 
 => CACHED [download 4/7] RUN . /clone.sh stable-diffusion-stability-ai https://github.com/Stability-AI/stablediffusion.git 47b6b607fdd31875c9279cd2f4f16b92e4ea958e &&     rm -rf assets data/**/*.  0.0s 
 => CACHED [download 5/7] RUN . /clone.sh CodeFormer https://github.com/sczhou/CodeFormer.git c5b4593074ba6214284d6acd5f1719b6c5d739af &&     rm -rf assets inputs                                    0.0s 
 => CACHED [download 6/7] RUN . /clone.sh BLIP https://github.com/salesforce/BLIP.git 48211a1594f1321b00f14c9f7a5b4813144b2fb9 &&     . /clone.sh k-diffusion https://github.com/crowsonkb/k-diffusi  0.0s 
 => CACHED [download 7/7] RUN wget -O /model.safetensors https://civitai.com/api/download/models/15236                                                                                                0.0s
 => CACHED [stage-1  5/16] COPY --from=download /repositories/ /stable-diffusion-webui/repositories/                                                                                                  0.0s
 => CACHED [stage-1  6/16] COPY --from=download /model.safetensors /model.safetensors                                                                                                                 0.0s
 => CACHED [stage-1  7/16] RUN mkdir /stable-diffusion-webui/interrogate && cp /stable-diffusion-webui/repositories/clip-interrogator/data/* /stable-diffusion-webui/interrogate                      0.0s
 => CACHED [stage-1  8/16] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r /stable-diffusion-webui/repositories/CodeFormer/requirements.txt                                         0.0s
 => CACHED [stage-1  9/16] COPY builder/requirements.txt /requirements.txt                                                                                                                            0.0s
 => CACHED [stage-1 10/16] RUN --mount=type=cache,target=/root/.cache/pip     pip install --upgrade pip &&     pip install --upgrade -r /requirements.txt --no-cache-dir &&     rm /requirements.txt  0.0s
 => CACHED [stage-1 11/16] RUN --mount=type=cache,target=/root/.cache/pip     cd stable-diffusion-webui &&     git fetch &&     git reset --hard 89f9faa63388756314e8a1d96cf86bf5e0663045 &&     pip  0.0s
 => CACHED [stage-1 12/16] ADD src .                                                                                                                                                                  0.0s
 => CACHED [stage-1 13/16] COPY builder/cache.py /stable-diffusion-webui/cache.py                                                                                                                     0.0s
 => ERROR [stage-1 14/16] RUN cd /stable-diffusion-webui && python cache.py --use-cpu=all --ckpt /model.safetensors                                                                                 365.3s
------                                                                                                                                                                                                     
 > [stage-1 14/16] RUN cd /stable-diffusion-webui && python cache.py --use-cpu=all --ckpt /model.safetensors:                                                                                              
#0 23.36 No module 'xformers'. Proceeding without it.                                                                                                                                                      
#0 23.40 Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled                                                                                                                                                                                                   
#0 29.31 Calculating sha256 for /model.safetensors: Downloading: "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth" to /stable-diffusion-webui/models/BLIP/model_base_caption_capfilt_large.pth
#0 29.37 
100%|██████████| 855M/855M [00:34<00:00, 26.2MB/s] 
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 696kB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 61.7kB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 570/570 [00:00<00:00, 1.24MB/s]
#0 141.3 9aba26abdfcd46073e0a1d42027a3a3bcc969f562d58a03637bf0a0ded6586c9
#0 141.3 Loading weights [9aba26abdf] from /model.safetensors
#0 142.3 Creating model from config: /stable-diffusion-webui/configs/v1-inference.yaml
#0 142.3 LatentDiffusion: Running in eps-prediction mode
#0 143.2 DiffusionWrapper has 859.52 M params.
Downloading (…)olve/main/vocab.json: 100%|██████████| 961k/961k [00:00<00:00, 1.99MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 525k/525k [00:00<00:00, 1.38MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 389/389 [00:00<00:00, 795kB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 905/905 [00:00<00:00, 1.72MB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 4.52k/4.52k [00:00<00:00, 8.20MB/s]
#0 176.0 Applying cross attention optimization (InvokeAI).
#0 176.1 Textual inversion embeddings loaded(0): 
#0 176.1 Model loaded in 146.8s (calculate hash: 112.0s, load weights from disk: 0.9s, create model: 17.7s, apply weights to model: 7.1s, apply half(): 8.9s).
#0 233.6 load checkpoint from /stable-diffusion-webui/models/BLIP/model_base_caption_capfilt_large.pth
100%|███████████████████████████████████████| 890M/890M [01:20<00:00, 11.6MiB/s]
------
Dockerfile:76
--------------------
  74 |     #TODO caching
  75 |     COPY builder/cache.py /stable-diffusion-webui/cache.py
  76 | >>> RUN cd /stable-diffusion-webui && python cache.py --use-cpu=all --ckpt /model.safetensors
  77 |     
  78 |     # Cleanup section (Worker Template)
--------------------
justinmerrell commented 1 year ago

Do you know if any changes have been made to any files?

It looks like the line number that the command runs on differs from the Dockerfile in the repo: image

I also our CD, and it was built without issues https://github.com/runpod-workers/worker-a1111/actions/runs/5273447850/jobs/9643760353

Will re-open if more information is provided on the error or if others can reproduce it.