replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.82k stars 545 forks source link

Could not find a suitable base image, continuing without base image support #1911

Open AmirulOm opened 3 weeks ago

AmirulOm commented 3 weeks ago

I'm building the cog for this project https://github.com/chenxwh/Grounded-Segment-Anything on my own server with A100 and I'm getting this error

$ cog build
Building Docker image from environment in cog.yaml as cog-grounded-segment-anything...
⚠ Stripping patch version from Torch version 3.10 to 3.10
⚠ Could not find a suitable base image, continuing without base image support (unsupported base image configuration: CUDA: 11.7 / Python: 3.10 / Torch: 1.13).
⚠ Stripping patch version from Torch version 3.10 to 3.10
⚠ Could not find a suitable base image, continuing without base image support (unsupported base image configuration: CUDA: 11.7 / Python: 3.10 / Torch: 1.13).
[+] Building 1.8s (10/12)                                                                                                                  docker:default
 => [internal] load build definition from Dockerfile                                                                                                 0.0s
 => => transferring dockerfile: 460B                                                                                                                 0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                          0.6s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                    0.0s
 => [internal] load .dockerignore                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                      0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04                                                               0.6s
 => [stage-0 1/6] FROM docker.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04@sha256:38e59267704b5d91ef63c7d8f613359c629fab0aead1283d59ca7821029e73b  0.0s
 => [internal] load build context                                                                                                                    0.0s
 => => transferring context: 63.79kB                                                                                                                 0.0s
 => CACHED [stage-0 2/6] RUN --mount=type=cache,target=/var/cache/apt,sharing=locked apt-get update -qq && apt-get install -qqy  && rm -rf /var/lib  0.0s
 => CACHED [stage-0 3/6] COPY .cog/tmp/build20240828081844.1031702797687672/requirements.txt /tmp/requirements.txt                                   0.0s
 => ERROR [stage-0 4/6] RUN pip install -r /tmp/requirements.txt                                                                                     0.3s
------                                                                                                                                                    
 > [stage-0 4/6] RUN pip install -r /tmp/requirements.txt:
0.260 /bin/sh: 1: pip: not found
------
Dockerfile:5
--------------------
   3 |     RUN --mount=type=cache,target=/var/cache/apt,sharing=locked apt-get update -qq && apt-get install -qqy  && rm -rf /var/lib/apt/lists/*
   4 |     COPY .cog/tmp/build20240828081844.1031702797687672/requirements.txt /tmp/requirements.txt
   5 | >>> RUN pip install -r /tmp/requirements.txt
   6 |     WORKDIR /src
   7 |     EXPOSE 5000
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install -r /tmp/requirements.txt" did not complete successfully: exit code: 127
ⅹ Failed to build Docker image: exit status 1

it seems that it couldn't find image configuration for CUDA: 11.7 / Python: 3.10 / Torch: 1.13 and image now doesn't have pip install.

for context, this is the cog.yaml

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  gpu: true
  cuda: "11.7"
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"
  python_version: "3.10"
  python_packages:
    - "timm==0.9.2"
    - "transformers==4.30.2"
    - "fairscale==0.4.13"
    - "pycocoevalcap==1.2"
    - "torch==1.13.0"
    - "torchvision==0.14.0"
    - "Pillow==9.5.0"
    - "scipy==1.10.1"
    - "opencv-python==4.7.0.72"
    - "addict==2.4.0"
    - "yapf==0.40.0"
    - "supervision==0.10.0"
    - git+https://github.com/openai/CLIP.git
    - ipython

predict: "predict.py:Predictor"

If I remove the CUDA version and the torch version. It able to build the image but there would be a version mismatch error when I run the image.

Appreciate any helps from the community

Shubham-Khichi commented 2 weeks ago

same issue!

AmirulOm commented 2 weeks ago

I manage to change the cog file and make it work, albeit I'm not sure is the right fix for it

i need to change the cuda version to 12.1, python version to 3.9, torch version to 2.3.1 and torchvision to 0.18.1

Here the full cog file that worked.

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  gpu: true
  cuda: "12.1"
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"
  python_version: "3.9"
  python_packages:
    - "timm==0.9.2"
    - "transformers==4.30.2"
    - "fairscale==0.4.13"
    - "pycocoevalcap==1.2"
    - "torch>=2.3.1"
    - "torchvision>=0.18.1"
    - "Pillow==9.5.0"
    - "scipy==1.10.1"
    - "opencv-python==4.7.0.72"
    - "addict==2.4.0"
    - "yapf==0.40.0"
    - "supervision==0.10.0"
    - git+https://github.com/openai/CLIP.git
    - ipython

predict: "predict.py:Predictor"