replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.92k stars 553 forks source link

`cog build --use-cog-base-image=false` fails on invalid wheel filename #1963

Open josephhaaga opened 2 weeks ago

josephhaaga commented 2 weeks ago

Looks like cog build --use-cog-base-image=false fails with this particular combination of CUDA 12.3 and Python 3.11 due to:

I suspect the generated wheel needs a filename that will pass this regex

My machine

```zsh $ cog --version cog version 0.9.23 (built 2024-09-13T09:49:08Z) $ neofetch 'c. josephhaaga@computer ,xNMM. --------------------- .OMMMMo OS: macOS 14.7 23H124 arm64 OMMM0, Host: MacBookPro18,2 .;loddo:' loolloddol;. Kernel: 23.6.0 cKMMMMMMMMMMNWMMMMMMMMMM0: Uptime: 1 day, 1 hour, 23 mins .KMMMMMMMMMMMMMMMMMMMMMMMWd. Packages: 183 (brew) XMMMMMMMMMMMMMMMMMMMMMMMX. Shell: zsh 5.9 ;MMMMMMMMMMMMMMMMMMMMMMMM: Resolution: 1728x1117 :MMMMMMMMMMMMMMMMMMMMMMMM: DE: Aqua .MMMMMMMMMMMMMMMMMMMMMMMMX. WM: yabai kMMMMMMMMMMMMMMMMMMMMMMMMWd. Terminal: tmux .XMMMMMMMMMMMMMMMMMMMMMMMMMMk CPU: Apple M1 Max .XMMMMMMMMMMMMMMMMMMMMMMMMK. GPU: Apple M1 Max kMMMMMMMMMMMMMMMMMMMMMMd Memory: 2481MiB / 32768MiB ;KMMMMMMMWXXWMMMMMMMk. .cooc,. .,coo:. ``` `Use Rosetta for x86_64/amd64 emulation on Apple Silicon` is **disabled**

details

There doesn't seem to be a CUDA 12.3 + Python 3.11 base image available

# cog.yaml
build:
  cuda: "12.3" # https://www.tensorflow.org/install/source#gpu
  gpu: true
  python_version: "3.11"
  python_packages:
    - "pip==24.2"
    - "pandas==2.2.2"
    - "tensorflow==2.16.2"
    - "tensorflow-datasets==4.9.6"
    - "tensorflow-recommenders==0.7.3"
    - "tf-keras==2.16.0"
    - "scann"
train: "train.py:train"
predict: "predict.py:Predictor"
Logs

```zsh $ cog build -t user-to-buzz:$(git rev-parse HEAD) ⚠ Cog doesn't know if CUDA 12.3 is compatible with Tensorflow 2.16.2. This might cause CUDA problems. Building Docker image from environment in cog.yaml as user-to-buzz:30af21cb81876f3681c190ecbc844d5c8e7c2750... [+] Building 1.0s (6/6) FINISHED docker:desktop-linux => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 442B 0.0s => resolve image config for docker-image://docker.io/docker/dockerfile:1.4 0.3s => [auth] docker/dockerfile:pull token for registry-1.docker.io 0.0s => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 58B 0.0s => ERROR [internal] load metadata for r8.im/cog-base:cuda12.3-python3.11 0.5s ------ > [internal] load metadata for r8.im/cog-base:cuda12.3-python3.11: ------ Dockerfile:2 -------------------- 1 | #syntax=docker/dockerfile:1.4 2 | >>> FROM r8.im/cog-base:cuda12.3-python3.11 3 | COPY .cog/tmp/build20240920112133.9388834066866224/requirements.txt /tmp/requirements.txt 4 | ENV CFLAGS="-O3 -funroll-loops -fno-strict-aliasing -flto -S" -------------------- ERROR: failed to solve: failed to resolve source metadata for r8.im/cog-base:cuda12.3-python3.11: r8.im/cog-base:cuda12.3-python3.11: not found ⅹ Failed to build Docker image: exit status 1 ```

Setting --use-cog-base-image=false results in an error with how the wheel file is named

[deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl:
9.250 ERROR: cog.whl is not a valid wheel filename.
10.33 10.33 [notice] A new release of pip is available: 24.0 -> 24.2 10.33 [notice] To update, run: pip install --upgrade pip

I suspect this could be fixed by updating pip to 24.2 – which I'm already doing via python_packages – but I don't think weeven get that far due to the wheel filename issue :-/

Logs

```zsh $ cog build --use-cog-base-image=false -t user-to-buzz:$(git rev-parse HEAD) ⚠ Cog doesn't know if CUDA 12.3 is compatible with Tensorflow 2.16.2. This might cause CUDA problems. Building Docker image from environment in cog.yaml as user-to-buzz:30af21cb81876f3681c190ecbc844d5c8e7c2750... [+] Building 12.8s (14/24) docker:desktop-linux => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 2.71kB 0.0s => resolve image config for docker-image://docker.io/docker/dockerfile:1.4 0.5s => [auth] docker/dockerfile:pull token for registry-1.docker.io 0.0s => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 58B 0.0s => [internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 0.5s => [internal] load metadata for docker.io/library/python:3.11 0.2s => [auth] nvidia/cuda:pull token for registry-1.docker.io 0.0s => [auth] library/python:pull token for registry-1.docker.io 0.0s => [deps 1/5] FROM docker.io/library/python:3.11@sha256:157a371e60389919fe4a72dff71ce86eaa5234f59114c23b0b346d0d02c74d39 0.0s => [internal] load build context 1.0s => => transferring context: 2.70MB 0.8s => CANCELED [stage-1 1/9] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413 11.7s => => resolve docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413 0.0s => => sha256:5d846bce3f9896ccd22114c9d44658c38798b5bd2660bc3048199d3840a2444d 19.68kB / 19.68kB 0.0s => => sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413 743B / 743B 0.0s => => sha256:4f00d5116a3679bab6bc13318c8555d7207206de2318e77348a9a93f66e73e21 2.84kB / 2.84kB 0.0s => => sha256:01007420e9b005dc14a8c8b0f996a2ad8e0d4af6c3d01e62f123be14fe48eec7 29.54MB / 29.54MB 1.1s => => sha256:bfc08b17629d5dde3f9b4b837997c26fee28c86d20cf6c65834066dff820c8fa 4.62MB / 4.62MB 0.8s => => sha256:86fc789646b553a337ffae04223a669744a8112e6e77d01ec87f9595e83e4b4f 57.07MB / 57.07MB 5.8s => => sha256:6b62141c2a212c553952737153b7ca35189c2fa4e1ba75e88f5e31b50de2c2d7 185B / 185B 0.9s => => sha256:e0e30e504698762f2cab0281477b911293c1b67c7b3b7a45d917b7fc68702c33 6.89kB / 6.89kB 1.0s => => sha256:346eb11560eafe7714b88308af8b0e03b3642b96787d05600dcbe5059b1c34e7 59.77MB / 1.29GB 11.7s => => extracting sha256:01007420e9b005dc14a8c8b0f996a2ad8e0d4af6c3d01e62f123be14fe48eec7 0.8s => => sha256:a011ef94b5587a8899dbd1b4e17f045365a367cf31892800eee099e48c60ddf9 63.93kB / 63.93kB 1.3s => => sha256:7543c096139519189025b8e57e2b2a1f2b1edc7c60b4aa56346063bc15e0cc1f 1.68kB / 1.68kB 1.4s => => sha256:43c77217e0094adc5276f4b8d9f01d9368adb606df79ab43e454477daf9a6b7a 1.52kB / 1.52kB 1.4s => => sha256:8ebe7e080c37469e9f54a4a0506785a20a769e656bd6f745c02d2e034ae5a2f8 138.41MB / 2.57GB 11.7s => => extracting sha256:bfc08b17629d5dde3f9b4b837997c26fee28c86d20cf6c65834066dff820c8fa 0.1s => => sha256:11f6815212a58a0087458334d15e35038ede10c7808b67285af9460e170f6648 88.61kB / 88.61kB 5.9s => => extracting sha256:86fc789646b553a337ffae04223a669744a8112e6e77d01ec87f9595e83e4b4f 0.6s => => sha256:cc5e7ed01d80eaef50bafb4073ac585f37ac8419515e1674746f21d5e10eb82b 55.57MB / 675.30MB 11.7s => => extracting sha256:6b62141c2a212c553952737153b7ca35189c2fa4e1ba75e88f5e31b50de2c2d7 0.0s => => extracting sha256:e0e30e504698762f2cab0281477b911293c1b67c7b3b7a45d917b7fc68702c33 0.0s => CACHED [deps 2/5] COPY .cog/tmp/build20240920115210.3368532414966447/cog.whl /tmp/cog.whl 0.0s => ERROR [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl 10.7s ------ > [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl: 9.148 ERROR: cog.whl is not a valid wheel filename. 10.32 10.32 [notice] A new release of pip is available: 24.0 -> 24.2 10.32 [notice] To update, run: pip install --upgrade pip ------ Dockerfile:5 -------------------- 3 | COPY .cog/tmp/build20240920115210.3368532414966447/cog.whl /tmp/cog.whl 4 | ENV CFLAGS="-O3 -funroll-loops -fno-strict-aliasing -flto -S" 5 | >>> RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl 6 | ENV CFLAGS= 7 | COPY .cog/tmp/build20240920115210.3368532414966447/requirements.txt /tmp/requirements.txt -------------------- ERROR: failed to solve: process "/bin/sh -c pip install --no-cache-dir -t /dep /tmp/cog.whl" did not complete successfully: exit code: 1 ⅹ Failed to build Docker image: exit status 1 ```

jesusmartinoza commented 1 day ago

Having the same issue. Did you manage to solve it?