replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
8.03k stars 559 forks source link

Error when trying to follow instructions in docs/getting-started.md #1325

Open cortfr opened 1 year ago

cortfr commented 1 year ago

I successfully installed cog using homebrew, and I was able to create and run the simple Python interactive shell step.

However, when attempting to follow the "Run predictions on a model" steps I am running into issues.

Specifically, when I run the "cog predict -i image=@input.jpg" command, it builds, and then gives me an error saying " Invalid file descriptor data passed to EncodedDescriptorDatabase::Add()".

Here is the entire output:

rt@macbook-pro-3 cog-quickstart % cog predict -i image=@input.jpg Building Docker image from environment in cog.yaml... [+] Building 7.3s (19/19) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 1.06kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => resolve image config for docker.io/docker/dockerfile:1.4 1.3s => [auth] docker/dockerfile:pull token for registry-1.docker.io 0.0s => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc 0.0s => [internal] load .dockerignore 0.0s => [internal] load build definition from Dockerfile 0.0s => [internal] load metadata for docker.io/library/python:3.11-slim 5.8s => [auth] library/python:pull token for registry-1.docker.io 0.0s => [internal] load build context 0.0s => => transferring context: 83.76kB 0.0s => [stage-0 1/7] FROM docker.io/library/python:3.11-slim@sha256:edaf703dce209d774af3ff768fc92b1e3b60261e7602126276f9ceb0e3a96874 0.0s => CACHED [stage-0 2/7] RUN --mount=type=cache,target=/var/cache/apt set -eux; apt-get update -qq; apt-get install -qqy --no-install-recommends curl; rm -rf /var/lib/apt/lists/*; TINI 0.0s => CACHED [stage-0 3/7] COPY .cog/tmp/build2463373346/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl 0.0s => CACHED [stage-0 4/7] RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl 0.0s => CACHED [stage-0 5/7] COPY .cog/tmp/build2463373346/requirements.txt /tmp/requirements.txt 0.0s => CACHED [stage-0 6/7] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s => CACHED [stage-0 7/7] WORKDIR /src 0.0s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:d31600e6b5b433414d6547b8b8c6acaaba1fadac998330baa19b8d4a147fb25c 0.0s => => naming to docker.io/library/cog-quickstart-base 0.0s => exporting cache 0.0s => => preparing build cache for export 0.0s

Starting Docker image cog-quickstart-base and running setup()... [libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:560] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add(). [libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): terminate called after throwing an instance of 'google::protobuf::FatalException' what(): CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): qemu: uncaught target signal 6 (Aborted) - core dumped ⅹ Failed to get container status: exit status 1

cortfr commented 1 year ago

Oh, my cog version is cog version 0.8.6 (built 2023-08-07T23:02:15+0000)

catsby commented 1 year ago

Hi there 👋

I believe this is an issue with the version of TensorFlow (v2.12.0 in the Getting Started Doc). The error(s) mentioned are occurring from inside the container at which point we're running python things.

In your cog.yaml if you bump TensorFlow up to at least 2.13.0 you should be able to get past this. Unfortunately I'm not entire sure what about the 2.13.0 release that addresses this, I didn't spot anything obvious in the release notes but then again I'm not too familiar with the project.

Can you try bumping up the version in your cog.yaml file and trying the prediction again?

zeke commented 9 months ago

Hi @catsby! Thanks for weighing in. @cortfr did bumping the TensorFlow version unblock you here?