zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.92k stars 427 forks source link

[BUG]: Local Docker installation needed even if using Cloud Build #1359

Closed TimovNiedek closed 1 year ago

TimovNiedek commented 1 year ago

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.34.0 ZENML_SERVER_VERSION: 0.34.0 ZENML_SERVER_DATABASE: mysql ZENML_SERVER_DEPLOYMENT_TYPE: other ZENML_CONFIG_DIR: /Users/timovanniedek/Library/Application Support/zenml ZENML_LOCAL_STORE_DIR: /Users/timovanniedek/Library/Application Support/zenml/local_stores ZENML_SERVER_URL: https://mle-zenml-server-dev-tbexpho5wa-ez.a.run.app ZENML_ACTIVE_REPOSITORY_ROOT: ... PYTHON_VERSION: 3.10.10 ENVIRONMENT: native SYSTEM_INFO: {'os': 'mac', 'mac_version': '12.6'} ACTIVE_WORKSPACE: default ACTIVE_STACK: mle_zenml_stack_dev_no_dv ACTIVE_USER: timovanniedek TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: 5e3c0340-c40d-4894-bd49-60b2913ce179 ANALYTICS_USER_ID: ce3d586c-46cb-4d46-848f-c2e059fb9fe5 ANALYTICS_SERVER_ID: 72a706c3-fabc-4f4d-8bf2-aa77cfd36913 INTEGRATIONS: ['gcp', 'github', 'huggingface', 'kaniko', 'kubeflow', 'kubernetes', 'mlflow', 'pillow', 'plotly', 'pytorch', 'scipy', 'sklearn'] PACKAGES: {'brotli': '1.0.9', 'deprecated': '1.2.13', 'flask': '2.2.2', 'gitpython': '3.1.31', 'jinja2': '3.1.2', 'mako': '1.2.4', 'markdown': '3.3.7', 'markupsafe': '2.1.2', 'pillow': '9.4.0', 'pyjwt': '2.6.0', 'pymysql': '1.0.2', 'pynacl': '1.5.0', 'pyyaml': '5.4.1', 'pygments': '2.14.0', 'pyment': '0.3.3', 'sqlalchemy': '1.4.41', 'sqlalchemy-utils': '0.38.3', 'send2trash': '1.8.0', 'shapely': '1.8.5.post1', 'werkzeug': '2.2.3', 'absl-py': '1.4.0', 'aiofiles': '23.1.0', 'aiohttp': '3.8.4', 'aiokafka': '0.8.0', 'aiosignal': '1.3.1', 'alembic': '1.8.1', 'analytics-python': '1.4.0', 'anyio': '3.6.2', 'appnope': '0.1.3', 'argon2-cffi': '21.3.0', 'argon2-cffi-bindings': '21.2.0', 'arrow': '1.2.3', 'asttokens': '2.2.1', 'async-timeout': '4.0.2', 'attrs': '22.2.0', 'backcall': '0.2.0', 'backoff': '1.10.0', 'bcrypt': '4.0.1', 'beautifulsoup4': '4.11.2', 'black': '22.12.0', 'bleach': '6.0.0', 'bracex': '2.3.post1', 'cachetools': '5.3.0', 'certifi': '2022.12.7', 'cffi': '1.15.1', 'cfgv': '3.3.1', 'chardet': '4.0.0', 'charset-normalizer': '3.0.1', 'click': '8.1.3', 'click-params': '0.3.0', 'cloudpickle': '2.2.1', 'colorama': '0.4.6', 'comm': '0.1.2', 'commonmark': '0.9.1', 'contourpy': '1.0.7', 'coverage': '5.5', 'cycler': '0.11.0', 'darglint': '1.8.1', 'databricks-cli': '0.17.4', 'datasets': '2.10.0', 'debugpy': '1.6.6', 'decorator': '5.1.1', 'defusedxml': '0.7.1', 'dill': '0.3.6', 'distlib': '0.3.6', 'distro': '1.8.0', 'docker': '6.0.1', 'docker-compose': '1.29.2', 'dockerpty': '0.4.1', 'docopt': '0.6.2', 'docstring-parser': '0.15', 'entrypoints': '0.4', 'evaluate': '0.4.0', 'exceptiongroup': '1.1.0', 'executing': '1.2.0', 'fastapi': '0.89.1', 'fastjsonschema': '2.16.2', 'filelock': '3.9.0', 'fire': '0.5.0', 'fonttools': '4.38.0', 'fqdn': '1.5.1', 'frontiersml': '1.2.1', 'frozenlist': '1.3.3', 'fsspec': '2023.1.0', 'gcsfs': '2023.1.0', 'gevent': '22.10.2', 'geventhttpclient': '2.0.2', 'ghp-import': '2.1.0', 'gitdb': '4.0.10', 'google-api-core': '2.11.0', 'google-api-python-client': '1.12.11', 'google-auth': '2.16.1', 'google-auth-httplib2': '0.1.0', 'google-auth-oauthlib': '1.0.0', 'google-cloud-aiplatform': '1.22.0', 'google-cloud-bigquery': '3.6.0', 'google-cloud-build': '3.13.0', 'google-cloud-core': '2.3.2', 'google-cloud-functions': '1.11.0', 'google-cloud-resource-manager': '1.8.1', 'google-cloud-scheduler': '2.9.1', 'google-cloud-secret-manager': '2.15.1', 'google-cloud-storage': '2.7.0', 'google-crc32c': '1.5.0', 'google-resumable-media': '2.4.1', 'googleapis-common-protos': '1.58.0', 'greenlet': '2.0.2', 'grpc-google-iam-v1': '0.12.6', 'grpcio': '1.51.3', 'grpcio-status': '1.48.2', 'gunicorn': '20.1.0', 'h11': '0.14.0', 'httplib2': '0.19.1', 'huggingface-hub': '0.12.1', 'hypothesis': '6.68.2', 'identify': '2.5.18', 'idna': '3.4', 'importlib-metadata': '5.2.0', 'iniconfig': '2.0.0', 'ipykernel': '6.21.2', 'ipython': '8.10.0', 'ipython-genutils': '0.2.0', 'ipywidgets': '7.7.3', 'isoduration': '20.11.0', 'itsdangerous': '2.1.2', 'jedi': '0.18.2', 'joblib': '1.2.0', 'jsonpointer': '2.3', 'jsonschema': '3.2.0', 'jupyter-client': '8.0.3', 'jupyter-core': '5.2.0', 'jupyter-events': '0.6.3', 'jupyter-server': '2.3.0', 'jupyter-server-terminals': '0.4.4', 'jupyterlab-pygments': '0.2.2', 'jupyterlab-widgets': '1.1.2', 'kafka-python': '2.0.2', 'kfp': '1.8.16', 'kfp-pipeline-spec': '0.1.16', 'kfp-server-api': '1.8.5', 'kiwisolver': '1.4.4', 'kubernetes': '18.20.0', 'llvmlite': '0.39.1', 'matplotlib': '3.7.0', 'matplotlib-inline': '0.1.6', 'mergedeep': '1.3.4', 'mike': '1.1.2', 'mistune': '2.0.5', 'mkdocs': '1.4.2', 'mkdocs-autorefs': '0.4.1', 'mkdocs-awesome-pages-plugin': '2.8.0', 'mkdocs-material': '8.5.11', 'mkdocs-material-extensions': '1.1.1', 'mkdocstrings': '0.17.0', 'mlflow': '2.1.1', 'mlflow-skinny': '1.30.0', 'mlserver': '1.2.3', 'mlserver-mlflow': '1.2.3', 'monotonic': '1.6', 'multidict': '6.0.4', 'multiprocess': '0.70.14', 'mypy': '0.971', 'mypy-extensions': '1.0.0', 'natsort': '8.2.0', 'nbclassic': '0.5.2', 'nbclient': '0.7.2', 'nbconvert': '7.2.9', 'nbformat': '5.7.3', 'nest-asyncio': '1.5.6', 'nodeenv': '1.7.0', 'notebook': '6.5.2', 'notebook-shim': '0.2.2', 'numba': '0.56.4', 'numpy': '1.24.2', 'nvidia-ml-py3': '7.352.0', 'oauthlib': '3.2.2', 'orjson': '3.8.6', 'packaging': '21.3', 'pandas': '1.5.3', 'pandocfilters': '1.5.0', 'paramiko': '3.0.0', 'parso': '0.8.3', 'passlib': '1.7.4', 'pathspec': '0.11.0', 'pexpect': '4.8.0', 'pickleshare': '0.7.5', 'pip': '23.0.1', 'platformdirs': '3.0.0', 'plotly': '5.13.0', 'pluggy': '1.0.0', 'pprintpp': '0.4.0', 'pre-commit': '2.21.0', 'prometheus-client': '0.16.0', 'prompt-toolkit': '3.0.37', 'proto-plus': '1.22.2', 'protobuf': '3.20.3', 'psutil': '5.9.4', 'ptyprocess': '0.7.0', 'pure-eval': '0.2.2', 'py': '1.11.0', 'py-grpc-prometheus': '0.7.0', 'pyarrow': '11.0.0', 'pyasn1': '0.4.8', 'pyasn1-modules': '0.2.8', 'pycparser': '2.21', 'pydantic': '1.10.5', 'pymdown-extensions': '9.9.2', 'pyparsing': '2.4.7', 'pyrsistent': '0.19.3', 'pytest': '7.2.1', 'pytest-clarity': '1.0.1', 'pytest-mock': '3.10.0', 'pytest-randomly': '3.12.0', 'pytest-unordered': '0.5.2', 'python-dateutil': '2.8.2', 'python-dotenv': '0.21.1', 'python-json-logger': '2.0.7', 'python-rapidjson': '1.9', 'python-terraform': '0.10.1', 'pytkdocs': '0.16.1', 'pytz': '2022.7.1', 'pyyaml-env-tag': '0.1', 'pyzmq': '25.0.0', 'querystring-parser': '1.2.4', 'regex': '2022.10.31', 'requests': '2.28.2', 'requests-oauthlib': '1.3.1', 'requests-toolbelt': '0.10.1', 'responses': '0.18.0', 'rfc3339-validator': '0.1.4', 'rfc3986-validator': '0.1.1', 'rich': '12.6.0', 'rsa': '4.9', 'ruff': '0.0.217', 'scikit-learn': '1.2.1', 'scipy': '1.10.1', 'setuptools': '67.4.0', 'shap': '0.41.0', 'six': '1.16.0', 'slicer': '0.0.7', 'smmap': '5.0.0', 'sniffio': '1.3.0', 'sortedcontainers': '2.4.0', 'soupsieve': '2.4', 'sqlalchemy2-stubs': '0.0.2a32', 'sqlmodel': '0.0.8', 'sqlparse': '0.4.3', 'stack-data': '0.6.2', 'starlette': '0.22.0', 'starlette-exporter': '0.15.1', 'strip-hints': '0.1.10', 'tabulate': '0.9.0', 'tenacity': '8.2.1', 'termcolor': '2.2.0', 'terminado': '0.17.1', 'texttable': '1.6.7', 'threadpoolctl': '3.1.0', 'tinycss2': '1.2.1', 'tokenizers': '0.13.2', 'toml': '0.10.2', 'tomli': '2.0.1', 'torch': '1.13.1', 'tornado': '6.2', 'tox': '3.28.0', 'tqdm': '4.64.1', 'traitlets': '5.9.0', 'transformers': '4.26.1', 'tritonclient': '2.30.0', 'typer': '0.4.2', 'types-markdown': '3.4.2.5', 'types-pillow': '9.4.0.16', 'types-pymysql': '1.0.19.5', 'types-pyyaml': '6.0.12.8', 'types-certifi': '2021.10.8.3', 'types-croniter': '1.3.2.6', 'types-futures': '3.3.8', 'types-protobuf': '3.20.4.6', 'types-psutil': '5.9.5.8', 'types-pyopenssl': '23.0.0.4', 'types-python-dateutil': '2.8.19.8', 'types-python-slugify': '5.0.4', 'types-redis': '4.5.1.3', 'types-requests': '2.28.11.14', 'types-setuptools': '57.4.18', 'types-six': '1.16.21.6', 'types-termcolor': '1.1.6.1', 'types-urllib3': '1.26.25.7', 'typing-extensions': '4.5.0', 'uri-template': '1.2.0', 'uritemplate': '3.0.1', 'urllib3': '1.26.14', 'uvicorn': '0.20.0', 'uvloop': '0.17.0', 'validators': '0.18.2', 'verspec': '0.1.0', 'virtualenv': '20.19.0', 'watchdog': '2.2.1', 'wcmatch': '8.4.1', 'wcwidth': '0.2.6', 'webcolors': '1.12', 'webencodings': '0.5.1', 'websocket-client': '1.5.1', 'wheel': '0.38.4', 'widgetsnbextension': '3.6.2', 'wrapt': '1.15.0', 'xxhash': '3.2.0', 'yarl': '1.8.2', 'zenml': '0.34.0', 'zipp': '3.15.0', 'zope.event': '4.6', 'zope.interface': '5.5.2'}

What happened?

Accoding to the documentation on Google Cloud Image Builder it should be possible to use if you are unable to install Docker on your machine. However, we've found that if we try to run a pipeline, the intermediate docker image gets built by Cloud Build correctly, but after that we get an error:

DockerException: Error while fetching server API version: ('Connection aborted.', ConnectionRefusedError(61, 'Connection refused'))

We're using Vertex AI as an orchestrator and GCP container registry so there is no need to run the containers locally.

A colleague of mine reported the following error in the same situation:

DockerException: Error while fetching server API version: (2, 'CreateFile', 'The system cannot find the file specified.')

Reproduction steps

  1. Set up a stack using Google Cloud Image Builder, Vertex AI as orchestrator
  2. Create a basic pipeline
  3. Quit Docker Desktop locally
  4. Run the pipeline

Relevant log output

... 
│ /lib/python3.10/site-packages/zenml/utils/pipeline_docker_image_builder.py:241 in                │
│ build_docker_image                                                                               │
│                                                                                                  │
│   238 │   │   │   │   # If the image is local, we don't need to pull it. Otherwise               │
│   239 │   │   │   │   # we play it safe and always pull in case the user pushed a new            │
│   240 │   │   │   │   # image for the given name and tag                                         │
│ ❱ 241 │   │   │   │   pull_parent_image = not docker_utils.is_local_image(                       │
│   242 │   │   │   │   │   parent_image                                                           │
│   243 │   │   │   │   )                                                                          │
│   244                                                                                            ...
│ /lib/python3.10/site-packages/zenml/utils/docker_utils.py:319 in is_local_image                  │
│                                                                                                  │
│   316 │   Returns:                                                                               │
│   317 │   │   `True` if the image was pulled from a registry, `False` otherwise.                 │
│   318 │   """                                                                                    │
│ ❱ 319 │   docker_client = DockerClient.from_env()                                                │
│   320 │   images = docker_client.images.list(name=image_name)                                    │
│   321 │   if images:                                                                             │
│   322 │   │   # An image with this name is available locally -> now check whether it             │
│                                                                                                  │
...
│ /lib/python3.10/site-packages/docker/client.py:96 in from_env                                    │
│                                                                                                  │
│    93 │   │   max_pool_size = kwargs.pop('max_pool_size', DEFAULT_MAX_POOL_SIZE)                 │
│    94 │   │   version = kwargs.pop('version', None)                                              │
│    95 │   │   use_ssh_client = kwargs.pop('use_ssh_client', False)                               │
│ ❱  96 │   │   return cls(                                                                        │
│    97 │   │   │   timeout=timeout,                                                               │
│    98 │   │   │   max_pool_size=max_pool_size,                                                   │
│    99 │   │   │   version=version,                                                               │
│                                                                                                  │
...
│ /lib/python3.10/site-packages/docker/client.py:45 in __init__                                    │
│                                                                                                  │
│    42 │   │   │   to save in the pool.                                                           │
│    43 │   """                                                                                    │
│    44 │   def __init__(self, *args, **kwargs):                                                   │
│ ❱  45 │   │   self.api = APIClient(*args, **kwargs)                                              │
│    46 │                                                                                          │
│    47 │   @classmethod                                                                           │
│    48 │   def from_env(cls, **kwargs):                                                           │
│                                                                                                  │
...
│ /lib/python3.10/site-packages/docker/api/client.py:197 in __init__                               │
│                                                                                                  │
│   194 │   │   │   │   │   │   │   │   version,                                                   │
│   195 │   │   │   │   │   │   │   │   str                                                        │
│   196 │   │   │   │   │   │   │   │   ) and version.lower() == 'auto'):                          │
│ ❱ 197 │   │   │   self._version = self._retrieve_server_version()                                │
│   198 │   │   else:                                                                              │
│   199 │   │   │   self._version = version                                                        │
│   200 │   │   if not isinstance(self._version, str):                                             │
...
│ /lib/python3.10/site-packages/docker/api/client.py:221 in _retrieve_server_version               │
│                                                                                                  │
│   218 │   │   │   │   ' is missing.'                                                             │
│   219 │   │   │   )                                                                              │
│   220 │   │   except Exception as e:                                                             │
│ ❱ 221 │   │   │   raise DockerException(                                                         │
│   222 │   │   │   │   f'Error while fetching server API version: {e}'                            │
│   223 │   │   │   )                                                                              │
│   224                                                                                            │
│                                                                                                  │

Code of Conduct

schustmi commented 1 year ago

Hey @TimovNiedek, yep that seems like an oversight and that line should not be called when using a remote image builder. We'll probably fix it in the next release.

schustmi commented 1 year ago

Fixed in 0.36.0