replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
8.07k stars 561 forks source link

[Cog v0.8.0 error after upgrading from v0.7.2] Error on Cog build: exec: /sbin/ldconfig.real: not found #1189

Closed Glavin001 closed 1 year ago

Glavin001 commented 1 year ago

Impact: I'm unable to build any image using Cog and therefore deploy any models to Replicate.


On both Lambdalabs and TensorDock:

sudo cog build

cog.yaml:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  # set to true if your model requires a GPU
  gpu: true

  cuda: "11.8"

  # python version in the form '3.8' or '3.8.12'
  python_version: "3.10"

  # a list of packages in the format <package-name>==<version>
  python_packages:
    - "torch==2.0.0"
    - "transformers==4.30.1"
    - "sentencepiece==0.1.97"
    - "accelerate==0.20.3"
    # https://github.com/oobabooga/text-generation-webui/blob/main/docs/LLaMA-model.md#option-2-convert-the-weights-yourself
    - "protobuf==3.20.1"
    - "auto-gptq==0.2.2"

# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"

I receive the following error logs:

 => ERROR [stage-1  3/11] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recom  16.4s
------                                                                                                                                      
 > [stage-1  3/11] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends      make        build-essential         libssl-dev      zlib1g-dev      libbz2-dev      libreadline-dev         libsqlite3-dev  wget    curl    llvm        libncurses5-dev         libncursesw5-dev        xz-utils        tk-dev  libffi-dev      liblzma-dev     git     ca-certificates    && rm -rf /var/lib/apt/lists/*:                                                                                                              
#0 13.37 debconf: delaying package configuration, since apt-utils is not installed    

....

#0 20.67 Setting up tk-dev:amd64 (8.6.11+1build2) ...
#0 20.67 Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
#0 20.67 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 20.67 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 20.67 dpkg: error processing package libc-bin (--configure):
#0 20.67  installed libc-bin package post-installation script subprocess returned error exit status 127
#0 20.68 Errors were encountered while processing:
#0 20.68  libc-bin
#0 20.69 E: Sub-process /usr/bin/dpkg returned an error code (1)
------
Dockerfile:13
--------------------
  12 |     ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
  13 | >>> RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
  14 | >>>      make \
  15 | >>>      build-essential \
  16 | >>>      libssl-dev \
  17 | >>>      zlib1g-dev \
  18 | >>>      libbz2-dev \
  19 | >>>      libreadline-dev \
  20 | >>>      libsqlite3-dev \
  21 | >>>      wget \
  22 | >>>      curl \
  23 | >>>      llvm \
  24 | >>>      libncurses5-dev \
  25 | >>>      libncursesw5-dev \
  26 | >>>      xz-utils \
  27 | >>>      tk-dev \
  28 | >>>      libffi-dev \
  29 | >>>      liblzma-dev \
  30 | >>>      git \
  31 | >>>      ca-certificates \
  32 | >>>      && rm -rf /var/lib/apt/lists/*
  33 |     RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update -qq && apt-get install -qqy --no-install-recommends \tmake \tbuild-essential \tlibssl-dev \tzlib1g-dev \tlibbz2-dev \tlibreadline-dev \tlibsqlite3-dev \twget \tcurl \tllvm \tlibncurses5-dev \tlibncursesw5-dev \txz-utils \ttk-dev \tlibffi-dev \tliblzma-dev \tgit \tca-certificates \t&& rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 100
ⅹ Failed to build Docker image: exit status 1
mattt commented 1 year ago

Thanks for writing this up, @Glavin001. I'm sorry you're hitting this issue. I've got a few theories at the moment:

  1. Something changed in Debian packaging. To help test that, could you please share the output of running cog debug?
  2. Running as sudo is somehow causing a problem. What happens if you run without sudo?
  3. Something about Windows. Many of the threads I found searching for these errors mentioned WSL2. You mentioned running this on a cloud GPU, so that seems unrelated. Just to rule that out, is WSL2 used in any part of your stack?
Glavin001 commented 1 year ago

Thanks for your prompt reply and ideas!

  1. cog debug:
$ sudo cog debug
#syntax=docker/dockerfile:1.4
FROM curlimages/curl AS downloader
ARG TINI_VERSION=0.19.0
WORKDIR /tmp
RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
COPY --link --from=downloader /tmp/tini /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
        make \
        build-essential \
        libssl-dev \
        zlib1g-dev \
        libbz2-dev \
        libreadline-dev \
        libsqlite3-dev \
        wget \
        curl \
        llvm \
        libncurses5-dev \
        libncursesw5-dev \
        xz-utils \
        tk-dev \
        libffi-dev \
        liblzma-dev \
        git \
        ca-certificates \
        && rm -rf /var/lib/apt/lists/*
RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
        git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && \
        pyenv install-latest "3.10" && \
        pyenv global $(pyenv install-latest --print "3.10") && \
        pip install "wheel<1"
COPY .cog/tmp/build4127551442/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
COPY .cog/tmp/build4127551442/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
CMD ["python", "-m", "cog.server.http"]
COPY . /src
  1. I'm unable to run docker without sudo on Lambdalabs:
$ cog build
Building Docker image from environment in cog.yaml as cog-replicate-startup-intervie...
ERROR: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
ⅹ Failed to build Docker image: exit status 1
  1. I don't think Windows any part of my stack. I'm SSHing into LambdaLabs server from a Mac. Lambdalabs server is:
    • Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-67-generic x86_64)
    • 1x A10 (24 GB PCIe) 30 vCPUs, 200 GiB RAM, 1.4 TiB SSD
Glavin001 commented 1 year ago

Also this exact config cog.yaml was working ~4 days ago when I last built and pushed model to Replicate. Also I need to reinstall Cog each time, so may be a change between versions.

I think I was on Cog 7.2 before and recently Cog 8 (released 3 days ago)?

Glavin001 commented 1 year ago

I'm seeing a lot of mentions of WSL (Linux on Windows?) in related issues: https://github.com/microsoft/WSL/issues/4760

Maybe LambdaLabs is using Windows in their stack? Not sure how to verify

Glavin001 commented 1 year ago

There may be a way to check: https://github.com/microsoft/WSL/issues/4071#issuecomment-496715404

I'll try tonight.

Glavin001 commented 1 year ago

Doesn't look like Windows WSL?

ubuntu@IP:~$ /proc/version
bash: /proc/version: Permission denied
ubuntu@IP:~$ sudo /proc/version
sudo: /proc/version: command not found
ubuntu@IP:~$ uname -a
Linux IP 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Glavin001 commented 1 year ago

v0.8.0 is the issue.

Workaround: Downgrading to v0.7.2 fixes the issues! 🎉 ✅

$ sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/download/v0.7.2/cog_Linux_x86_64"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 9444k  100 9444k    0     0  10.7M      0 --:--:-- --:--:-- --:--:-- 56.5M
$ sudo chmod +x /usr/local/bin/cog
$ cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z)
$ sudo cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z)

$ sudo cog debug
⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
# syntax = docker/dockerfile:1.2
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
RUN --mount=type=cache,target=/var/cache/apt set -eux; \
apt-get update -qq; \
apt-get install -qqy --no-install-recommends curl; \
rm -rf /var/lib/apt/lists/*; \
TINI_VERSION=v0.19.0; \
TINI_ARCH="$(dpkg --print-architecture)"; \
curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
chmod +x /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
        make \
        build-essential \
        libssl-dev \
        zlib1g-dev \
        libbz2-dev \
        libreadline-dev \
        libsqlite3-dev \
        wget \
        curl \
        llvm \
        libncurses5-dev \
        libncursesw5-dev \
        xz-utils \
        tk-dev \
        libffi-dev \
        liblzma-dev \
        git \
        ca-certificates \
        && rm -rf /var/lib/apt/lists/*
RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
        git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && \
        pyenv install-latest "3.10" && \
        pyenv global $(pyenv install-latest --print "3.10") && \
        pip install "wheel<1"
COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
CMD ["python", "-m", "cog.server.http"]
COPY . /src
Glavin001 commented 1 year ago

Here's the cog debug diff between v0.7.2 and v0.8.0:

--- v0.7.2.txt  2023-07-11 05:31:08
+++ v0.8.0.txt  2023-07-11 05:31:28
@@ -1,18 +1,14 @@
 $ sudo cog debug
-⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
-# syntax = docker/dockerfile:1.2
+#syntax=docker/dockerfile:1.4
+FROM curlimages/curl AS downloader
+ARG TINI_VERSION=0.19.0
+WORKDIR /tmp
+RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
 FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
 ENV DEBIAN_FRONTEND=noninteractive
 ENV PYTHONUNBUFFERED=1
 ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
-RUN --mount=type=cache,target=/var/cache/apt set -eux; \
-apt-get update -qq; \
-apt-get install -qqy --no-install-recommends curl; \
-rm -rf /var/lib/apt/lists/*; \
-TINI_VERSION=v0.19.0; \
-TINI_ARCH="$(dpkg --print-architecture)"; \
-curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
-chmod +x /sbin/tini
+COPY --link --from=downloader /tmp/tini /sbin/tini
 ENTRYPOINT ["/sbin/tini", "--"]
 ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
 RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
@@ -40,9 +36,9 @@
         pyenv install-latest "3.10" && \
         pyenv global $(pyenv install-latest --print "3.10") && \
         pip install "wheel<1"
-COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
+COPY .cog/tmp/build4127551442/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
 RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
-COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
+COPY .cog/tmp/build4127551442/requirements.txt /tmp/requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
 WORKDIR /src
 EXPOSE 5000
EmilioNicolas commented 1 year ago

Thanks to @Glavin001 for the quick workaround! here you have the downgrade quick code:

sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/download/v0.7.2/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
ichernev commented 1 year ago

I also hit this on lambdalabs machines, cog version 0.8.1. Another workaround is to set gpu: false, but then you have to launch your containers manually with docker --gpus all. At least it points to a problem in the gpu-specific places.

hongchaodeng commented 1 year ago

I think I get a hold of what caused the issue here. First and foremost to point out that the root cause of this problem lies in the lines of code that installed tini.

To verify it, I have created three simplified dockerfiles.

The Original One (v0.7.2) works

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"

## Here's the original part that installed tini
RUN --mount=type=cache,target=/var/cache/apt set -eux; \
    apt-get update -qq; \
    apt-get install -qqy --no-install-recommends curl; \
    rm -rf /var/lib/apt/lists/*; \
    TINI_VERSION=v0.19.0; \
    TINI_ARCH="$(dpkg --print-architecture)"; \
    curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
    chmod +x /sbin/tini

ENTRYPOINT ["/sbin/tini", "--"]
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
    make \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    wget \
    curl \
    llvm \
    libncurses5-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libffi-dev \
    liblzma-dev \
    git \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"]

Building works fine:

sudo docker build -t cog-stable-diffusion -f .tmp/Dockerfile1 .

The New One (v0.8.1) fails

This one fails at apt-get update. This is confusing as we will see next that removing tiny part works.

## This is the new part that downloads tiny in downloader
FROM curlimages/curl AS downloader
ARG TINI_VERSION=0.19.0
WORKDIR /tmp
RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin

## This is the new part that installs tini
COPY --link --from=downloader /tmp/tini /sbin/tini

ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
    make \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    wget \
    curl \
    llvm \
    libncurses5-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libffi-dev \
    liblzma-dev \
    git \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"]

Removing the tiny part in new one (v0.8.1) works

The following dockerfile only removes the tiny downloader and tiny copy cmd. Now it builds successfully.

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
    make \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    wget \
    curl \
    llvm \
    libncurses5-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libffi-dev \
    liblzma-dev \
    git \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"]

Conclusion

It's obvious the change on installing tiny breaks the apt-get system. I am not sure how it breaks internally. But reverting the tiny change might be the correct solution here.

cc @mattt

hongchaodeng commented 1 year ago

Another workaround is to set gpu: false

When you set gpu: false, the base image is python:3.10, compared to nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04 otherwise.

hongchaodeng commented 1 year ago

I found the nuance!

The new one MISSED the ${TINI_ARCH}:

# Old
TINI_ARCH="$(dpkg --print-architecture)"; \
curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \

vs

# New
RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
hongchaodeng commented 1 year ago

I want to share some good news. After apply the fix https://github.com/replicate/cog/pull/1208, cog build and run work again!

Snipaste_2023-07-14_19-21-40

Here's the dockerfile:

Snipaste_2023-07-14_19-21-54
technillogue commented 1 year ago

Hi everyone, apologies - I pushed this change in hopes of making the image smaller and faster to build.

It seems like you might have an older version of the cuda base image. The current version of 11.8.0-cudnn8-devel-ubuntu22.04 already have libc-bin installed, and also has /sbin/ldconfig.real. My guess is maybe the rm -rf /var/lib/apt/lists/* was important. Could you post docker images --no-trunc|grep cuda please?

hongchaodeng commented 1 year ago

Here's the output of sudo docker images --no-trunc|grep cuda:

nvidia/cuda   11.8.0-cudnn8-devel-ubuntu22.04   sha256:422a68abd82ed6f830178fadd24e9144ddc0461e558c90dd147fcc577ddea247   4 weeks ago   9.83GB
hongchaodeng commented 1 year ago

Actually I tried to add rm -rf /var/lib/apt/lists/* first and it still failed:

Snipaste_2023-07-22_10-17-27
djj0s3 commented 1 year ago

Can confirm this is still broken on Lambda Labs with new instances. Being that's the recommended cloud workflow, not fun there's not a fix yet! Is there a workaround besides downgrading back to 7.2 COG?

technillogue commented 1 year ago

@djj0s3, could you paste cog debug and docker images --no-trunc|grep cuda?

djj0s3 commented 1 year ago

cog debug:

#syntax=docker/dockerfile:1.4
FROM curlimages/curl AS downloader
ARG TINI_VERSION=0.19.0
WORKDIR /tmp
RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
COPY --link --from=downloader /tmp/tini /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
    make \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    wget \
    curl \
    llvm \
    libncurses5-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libffi-dev \
    liblzma-dev \
    git \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
    git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && \
    pyenv install-latest "3.10" && \
    pyenv global $(pyenv install-latest --print "3.10") && \
    pip install "wheel<1"
COPY .cog/tmp/build366469108/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy ffmpeg libsm6 libxext6 && rm -rf /var/lib/apt/lists/*
COPY .cog/tmp/build366469108/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
CMD ["python", "-m", "cog.server.http"]
COPY . /src

docker images --no-trunc|grep cuda yields no output. did you mean something else?

naklecha commented 1 year ago

I'm using cog 0.7.2, and I receive another error. Is there a workaround for this as well?

[+] Building 0.6s (7/7) FINISHED                                                                                                                                                                                                                                              
 => [internal] load .dockerignore                                                                                                                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                     0.0s
 => => transferring dockerfile: 1.89kB                                                                                                                                                                                                                                   0.0s
 => resolve image config for docker.io/docker/dockerfile:1.2                                                                                                                                                                                                             0.3s
 => CACHED docker-image://docker.io/docker/dockerfile:1.2@sha256:e2a8561e419ab1ba6b2fe6cbdf49fd92b95912df1cf7d313c3e2230a333fdbcc                                                                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                        0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                     0.0s
 => ERROR [internal] load metadata for docker.io/nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04                                                                                                                                                                             0.2s
------
 > [internal] load metadata for docker.io/nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04:
------
Dockerfile:1
--------------------
   1 | >>> # syntax = docker/dockerfile:1.2
   2 |     FROM nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04
   3 |     ENV DEBIAN_FRONTEND=noninteractive
--------------------
ERROR: failed to solve: docker.io/nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04: not found
ⅹ Failed to build Docker image: exit status 1
mattt commented 1 year ago

Hi @Glavin001. Thanks for your help and patience as we try to debug this issue. I apologize for the inconvenience this caused.

We just released Cog v0.8.2. This release includes #1231, which reverts #1161, which we believe to be the cause of the regression you're seeing.

Please give that a try when you have a chance and let us know if you're still having this issue. Thanks! 🙏