replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.88k stars 549 forks source link

cog-quickstart example fails on M1 Mac #958

Closed anilmurty closed 1 year ago

anilmurty commented 1 year ago

Just tried running through https://github.com/replicate/cog/blob/main/docs/getting-started.md on a Mac M1 pro (MacOS Monterey v12.2.1) and got this error:

Starting Docker image cog-cog-quickstart-base and running setup()...
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
qemu: uncaught target signal 6 (Aborted) - core dumped
ⅹ Failed to get container status: exit status 1

I wonder if it is because cog is missing the correct buildx option? https://github.com/replicate/cog/blob/75b7802219e7cd4cee845e34c4c22139558615d4/pkg/docker/build.go#L72

Note that buildx does not seem to support linux/arm/v8

% docker buildx ls
NAME/NODE       DRIVER/ENDPOINT STATUS  PLATFORMS
desktop-linux   docker                  
  desktop-linux desktop-linux   running linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
default *       docker                  
  default       default         running linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
stellanhaglund commented 1 year ago

same here

konstantin-frolov commented 1 year ago

Same issue. In all versions before 0.7.0-dev.

Traceback (most recent call last): File "/usr/local/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/usr/local/lib/python3.8/site-packages/cog/init.py", line 1, in from pydantic import BaseModel File "pydantic/init.py", line 2, in init pydantic.init File "pydantic/dataclasses.py", line 41, in init pydantic.dataclasses

+---------+-----------------------------------------+

ImportError: cannot import name dataclass_transform

ⅹ Failed to get type signature: exit status 1

- Host spec

```No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

go version go1.20.5 linux/amd64

cog version v0.7.0+dev (built 2023-06-08T16:34:43+0300)
konstantin-frolov commented 1 year ago

Same issue. In all versions before 0.7.0-dev.

  • Error
=> exporting cache                                                                                                                                                                                                                     0.0s
 => => preparing build cache for export                                                                                                                                                                                                 0.0s
Adding labels to image...

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 185, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.8/site-packages/cog/__init__.py", line 1, in <module>
    from pydantic import BaseModel
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
  File "pydantic/dataclasses.py", line 41, in init pydantic.dataclasses
    # +---------+-----------------------------------------+
ImportError: cannot import name dataclass_transform

ⅹ Failed to get type signature: exit status 1
  • Host spec
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

go version go1.20.5 linux/amd64

cog version v0.7.0+dev (built 2023-06-08T16:34:43+0300)

Solved by updating python typing-extensions. Add in cog.yml

  run:
    - pip install typing-extensions --upgrade
mattt commented 1 year ago

Hi @stellanhaglund. Thanks for reporting this. We're tracking this issue in https://github.com/replicate/cog/issues/336.

@konstantin-frolov The error you're reporting appears to be unrelated to the original issue. Please see https://github.com/replicate/cog/issues/1007 for details.

levelingup commented 1 year ago

I think changing the version of tensorflow to 2.13.0 fixed the error issue.

zeke-john commented 7 months ago

I tried using tensorflow==2.13.0 and still have gotten the same error

Info: Darwin Zekes-MacBook-Air.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:12 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8103 arm64

Any update?

image

My cog.yaml file:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
    # set to true if your model requires a GPU
    gpu: true
    cuda: '11.2.2'
    system_packages:
        - 'ffmpeg'
    python_version: '3.10'
    # a list of packages in the format <package-name>==<version>
    python_packages:
        - 'tensorflow==2.13.0'
        - 'numpy'
        - 'flashy'
        - 'git+https://github.com/facebookresearch/audiocraft.git'
        - 'wandb'
        - 'pydu'
        - 'boto3'
        - 'runpod'
        - 'awscli'
        - 'spleeter'
        - 'setuptools'

    # commands run after the environment is setup
    run:
        - 'export PYTHONIOENCODING=utf-8'

# predict.py defines how predictions are run on your model
predict: 'predict.py:MusicGeneratorPredictor'
smandalika commented 5 months ago

This is the exact same problem I'm having. I'm running cog on M3 Mac OS: 14.2.1 (23C71)

Screenshot 2024-04-14 at 10 37 30 PM
qzchenwl commented 2 weeks ago

The failure is due to Docker on Apple Silicon (ARM) running an x86 image, as indicated by the WARNING message. When ARM Docker executes an x86 image, importing ResNet50 fails.

This can be reproduced as follows:

cog-quickstart % docker run -it --rm cog-quickstart-base bash

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

root@72a125bc9dce:/src# python
Python 3.11.10 (main, Sep  9 2024, 18:05:07) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from tensorflow.keras.applications.resnet50 import ResNet50
Illegal instruction

root@72a125bc9dce:/src# echo $?
132

The solution is for Cog to provide an ARM version of the base image r8.im/cog-base:python3.11.

cog-quickstart % cog debug
...
FROM r8.im/cog-base:python3.11
...