zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.96k stars 430 forks source link

[BUG]: Return annotations are not correctly parsed when returning from a function call. #1727

Closed northanapon closed 1 year ago

northanapon commented 1 year ago

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.42.1 ZENML_SERVER_VERSION: 0.42.1 ZENML_SERVER_DATABASE: sqlite ZENML_SERVER_DEPLOYMENT_TYPE: local ZENML_CONFIG_DIR: /root/.config/zenml ZENML_LOCAL_STORE_DIR: /root/.config/zenml/local_stores ZENML_SERVER_URL: http://127.0.0.1:8237 ZENML_ACTIVE_REPOSITORY_ROOT: /caa-debug PYTHON_VERSION: 3.10.11 ENVIRONMENT: docker SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'debian', 'linux_distro_like': '', 'linux_distro_version': '11'} ACTIVE_WORKSPACE: default ACTIVE_STACK: caa_debug_stack ACTIVE_USER: default TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: 3eca8a0e-5a95-4423-ba2a-f0508382206a ANALYTICS_USER_ID: 79d63edb-f2b3-4656-93b2-419d831592ca ANALYTICS_SERVER_ID: 3eca8a0e-5a95-4423-ba2a-f0508382206a INTEGRATIONS: ['kaniko', 'lightgbm', 'mlflow', 'pillow', 'pytorch', 'pytorch_lightning', 'scipy', 'sklearn', 'xgboost'] PACKAGES: {'fsspec': '2023.6.0', 'certifi': '2023.5.7', 'tzdata': '2023.3', 'xarray': '2023.1.0', 'pytz': '2022.7.1', 'setuptools': '65.5.1', 'cryptography': '41.0.1', 'pyzmq': '25.1.0', 'gevent': '23.7.0', 'pip': '23.1.2', 'aiofiles': '23.1.0', 'attrs': '23.1.0', 'packaging': '23.1', 'azure-mgmt-resource': '23.0.1', 'argon2-cffi': '21.3.0', 'argon2-cffi-bindings': '21.2.0', 'isoduration': '20.11.0', 'gunicorn': '20.1.0', 'rich': '12.6.0', 'websockets': '11.0.3', 'pyarrow': '11.0.0', 'pillow': '9.5.0', 'ipython': '8.14.0', 'jupyter-client': '8.3.0', 'tenacity': '8.2.2', 'click': '8.1.3', 'ipywidgets': '7.8.0', 'nbconvert': '7.6.0', 'pytest': '7.4.0', 'overrides': '7.3.1', 'notebook': '7.0.1', 'ipykernel': '6.23.3', 'colorlog': '6.7.0', 'importlib-metadata': '6.7.0', 'tornado': '6.3.2', 'docker': '6.1.3', 'multidict': '6.0.4', 'pyyaml': '6.0.1', 'zope.interface': '6.0', 'bleach': '6.0.0', 'plotly': '5.15.0', 'importlib-resources': '5.12.0', 'psutil': '5.9.3', 'traitlets': '5.9.0', 'nbformat': '5.9.0', 'cachetools': '5.3.1', 'jupyter-core': '5.3.1', 'decorator': '5.1.1', 'altair': '5.0.1', 'zope.event': '5.0', 'smmap': '5.0.0', 'tqdm': '4.65.0', 'fonttools': '4.40.0', 'protobuf': '4.23.3', 'jsonargparse': '4.22.0', 'jsonschema': '4.17.3', 'beautifulsoup4': '4.12.2', 'antlr4-python3-runtime': '4.9.3', 'rsa': '4.9', 'pexpect': '4.8.0', 'typing-extensions': '4.7.1', 'tzlocal': '4.3.1', 'pyodbc': '4.0.39', 'gitdb': '4.0.10', 'async-timeout': '4.0.2', 'jupyterlab': '4.0.2', 'bcrypt': '4.0.1', 'zipp': '3.15.0', 'filelock': '3.9.0', 'orjson': '3.8.14', 'aiohttp': '3.8.4', 'platformdirs': '3.8.0', 'asgiref': '3.7.2', 'matplotlib': '3.7.1', 'anyio': '3.7.0', 'widgetsnbextension': '3.6.5', 'markdown': '3.4.3', 'idna': '3.4', 'lightgbm': '3.3.5', 'python-jose': '3.3.0', 'oauthlib': '3.2.2', 'paramiko': '3.2.0', 'optuna': '3.2.0', 'gitpython': '3.1.31', 'jinja2': '3.1.2', 'threadpoolctl': '3.1.0', 'charset-normalizer': '3.1.0', 'prompt-toolkit': '3.0.38', 'mistune': '3.0.1', 'watchdog': '3.0.0', 'markdown-it-py': '3.0.0', 'networkx': '3.0', 'tritonclient': '2.36.0', 'requests': '2.31.0', 'jupyterlab-server': '2.23.0', 'pycparser': '2.21', 'google-auth': '2.21.0', 'fastjsonschema': '2.17.1', 'pygments': '2.15.1', 'tensorboard': '2.13.0', 'babel': '2.12.1', 'numexpr': '2.8.4', 'python-dateutil': '2.8.2', 'jupyter-server': '2.7.0', 'pyjwt': '2.7.0', 'tensorboardx': '2.6.1', 'pyparsing': '2.4.7', 'soupsieve': '2.4.1', 'jsonpointer': '2.4', 'werkzeug': '2.3.6', 'flask': '2.3.2', 'omegaconf': '2.3.0', 'typeshed-client': '2.3.0', 'mlflow': '2.2.2', 'cloudpickle': '2.2.1', 'asttokens': '2.2.1', 'jupyter-lsp': '2.2.0', 'itsdangerous': '2.1.2', 'markupsafe': '2.1.2', 'python-json-logger': '2.0.7', 'pytorch-lightning': '2.0.4', 'kafka-python': '2.0.2', 'geventhttpclient': '2.0.2', 'greenlet': '2.0.2', 'async-lru': '2.0.2', 'torch': '2.0.1+cpu', 'tomli': '2.0.1', 'iniconfig': '2.0.0', 'grpcio': '1.56.0', 'botocore': '1.29.165', 'azure-core': '1.28.0', 'urllib3': '1.26.16', 'boto3': '1.26.3', 'streamlit': '1.24.0', 'numpy': '1.23.5', 'six': '1.16.0', 'cffi': '1.15.1', 'wrapt': '1.15.0', 'webcolors': '1.13', 'scipy': '1.11.1', 'sympy': '1.11.1', 'pydantic': '1.10.12', 'python-rapidjson': '1.10', 'backoff': '1.10.0', 'confluent-kafka': '1.9.2', 'yarl': '1.9.2', 'send2trash': '1.8.2', 'alembic': '1.8.1', 'distro': '1.8.0', 'xgboost': '1.7.6', 'passlib': '1.7.4', 'fastavro': '1.7.0', 'debugpy': '1.6.7', 'blinker': '1.6.2', 'websocket-client': '1.6.1', 'monotonic': '1.6', 'nest-asyncio': '1.5.6', 'fqdn': '1.5.1', 'pynacl': '1.5.0', 'pandocfilters': '1.5.0', 'sqlalchemy': '1.4.41', 'kiwisolver': '1.4.4', 'analytics-python': '1.4.post1', 'azure-mgmt-core': '1.4.0', 'absl-py': '1.4.0', 'mlserver': '1.3.5', 'mlserver-mlflow': '1.3.5', 'pandas': '1.3.5', 'frozenlist': '1.3.3', 'hydra-core': '1.3.2', 'joblib': '1.3.1', 'requests-oauthlib': '1.3.1', 'aiosignal': '1.3.1', 'uri-template': '1.3.0', 'sniffio': '1.3.0', 'deprecated': '1.2.13', 'mako': '1.2.4', 'querystring-parser': '1.2.4', 'arrow': '1.2.3', 'tinycss2': '1.2.1', 'scikit-learn': '1.2.1', 'mpmath': '1.2.1', 'pluggy': '1.2.0', 'executing': '1.2.0', 'catboost': '1.2', 'azure-common': '1.1.28', 'jupyterlab-widgets': '1.1.5', 'munkres': '1.1.4', 'exceptiongroup': '1.1.1', 'contourpy': '1.1.0', 'brotli': '1.0.9', 'pymysql': '1.0.3', 'feature-engine': '1.0.2', 'pympler': '1.0.1', 'jmespath': '1.0.1', 'python-dotenv': '1.0.0', 'google-auth-oauthlib': '1.0.0', 'fastapi': '0.89.1', 'numba': '0.56.4', 'zenml': '0.42.1', 'shap': '0.41.0', 'wheel': '0.40.0', 'llvmlite': '0.39.1', 'sqlalchemy-utils': '0.38.3', 'starlette': '0.22.0', 'graphviz': '0.20.1', 'pyrsistent': '0.19.3', 'httplib2': '0.19.1', 'validators': '0.18.2', 'jedi': '0.18.2', 'ecdsa': '0.18.0', 'databricks-cli': '0.17.7', 'uvicorn': '0.17.6', 'terminado': '0.17.1', 'uvloop': '0.17.0', 'prometheus-client': '0.17.0', 'starlette-exporter': '0.16.0', 'docstring-parser': '0.15', 'h11': '0.14.0', 'statsmodels': '0.14.0', 'seaborn': '0.12.2', 'toolz': '0.12.0', 'torchmetrics': '0.11.4', 'cycler': '0.11.0', 'toml': '0.10.2', 'python-terraform': '0.10.1', 'json5': '0.9.14', 'snorkel': '0.9.9', 'commonmark': '0.9.1', 'cmaes': '0.9.1', 'lightning-utilities': '0.9.0', 'typer': '0.9.0', 'tabulate': '0.9.0', 'parso': '0.8.3', 'watchgod': '0.8.2', 'aiokafka': '0.8.1', 'pydeck': '0.8.1b0', 'nbclient': '0.8.0', 'pickleshare': '0.7.5', 'defusedxml': '0.7.1', 'tensorboard-data-server': '0.7.1', 'py-grpc-prometheus': '0.7.0', 'ptyprocess': '0.7.0', 'jupyter-events': '0.6.3', 'stack-data': '0.6.2', 'isodate': '0.6.1', 'python-snappy': '0.6.1', 's3transfer': '0.6.1', 'httptools': '0.6.0', 'patsy': '0.5.3', 'webencodings': '0.5.1', 'pyasn1': '0.5.0', 'sqlparse': '0.4.4', 'jupyter-server-terminals': '0.4.4', 'entrypoints': '0.4', 'click-params': '0.3.0', 'pyasn1-modules': '0.3.0', 'pysftp': '0.2.9', 'wcwidth': '0.2.6', 'notebook-shim': '0.2.3', 'pure-eval': '0.2.2', 'jupyterlab-pygments': '0.2.2', 'fastapi-utils': '0.2.1', 'ipython-genutils': '0.2.0', 'backcall': '0.2.0', 'matplotlib-inline': '0.1.6', 'rfc3339-validator': '0.1.4', 'comm': '0.1.3', 'mdurl': '0.1.2', 'rfc3986-validator': '0.1.1', 'pytz-deprecation-shim': '0.1.0.post0', 'sqlmodel': '0.0.8', 'slicer': '0.0.7', 'python-multipart': '0.0.6', 'sqlalchemy2-stubs': '0.0.2a35'}

CURRENT STACK

Name: caa_debug_stack ID: b8172192-7ea0-462b-8b1d-5126861b2de9 Shared: No User: default / 79d63edb-f2b3-4656-93b2-419d831592ca Workspace: default / 3289b629-24a3-432a-b20d-2acb6a3f8416

ORCHESTRATOR: default

Name: default ID: 19ff8828-e4ec-4ca6-97c7-1fdc36435fd8 Type: orchestrator Flavor: local Configuration: {} Shared: No User: default / 79d63edb-f2b3-4656-93b2-419d831592ca Workspace: default / 3289b629-24a3-432a-b20d-2acb6a3f8416

ARTIFACT_STORE: default

Name: default ID: 8be63088-5d5f-4da9-ad80-5295c158d6ab Type: artifact_store Flavor: local Configuration: {'path': ''} Shared: No User: default / 79d63edb-f2b3-4656-93b2-419d831592ca Workspace: default / 3289b629-24a3-432a-b20d-2acb6a3f8416

EXPERIMENT_TRACKER: mlflow_tracker

Name: mlflow_tracker ID: 3a0fe600-146f-4ece-87c4-9180eb3f78e3 Type: experiment_tracker Flavor: mlflow Configuration: {'experiment_name': None, 'nested': False, 'tags': {}, 'tracking_uri': 'http://10.34.33.203:5011/', 'tracking_username': '****', 'tracking_password': '****', 'tracking_token': '****', 'tracking_insecure_tls': True, 'databricks_host': None} Shared: No User: default / 79d63edb-f2b3-4656-93b2-419d831592ca Workspace: default / 3289b629-24a3-432a-b20d-2acb6a3f8416

What happened?

zenml.utils.parse_return_type_annotations does not follow a function call in the return statement.

from typing import Tuple
from zenml import pipeline, step

def a() -> Tuple[int, int]:
  return 42, 42

@step
def b() -> Tuple[int, int]:
  return a()

@pipeline
def c():
  x, y = b()

c()

Currently, I can rewrite all of my step functions to be a bit longer:

@step
def b() -> Tuple[int, int]:
  x, y = a()
  return x, y

Reproduction steps

  1. Run the above code

Relevant log output

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /caa-debug/run__.py:16 in <module>                                                               │
│                                                                                                  │
│   13   x, y = b()                                                                                │
│   14                                                                                             │
│   15                                                                                             │
│ ❱ 16 c()                                                                                         │
│                                                                                                  │
│ /usr/local/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:1224 in __call__         │
│                                                                                                  │
│   1221 │   │   │   # pipeline. Is this what we want?                                             │
│   1222 │   │   │   return self.entrypoint(*args, **kwargs)                                       │
│   1223 │   │                                                                                     │
│ ❱ 1224 │   │   self.prepare(*args, **kwargs)                                                     │
│   1225 │   │   return self._run(**self._run_args)                                                │
│   1226 │                                                                                         │
│   1227 │   def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None:                        │
│                                                                                                  │
│ /usr/local/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:384 in prepare           │
│                                                                                                  │
│    381 │   │   │   # Enter the context manager, so we become the active pipeline. This           │
│    382 │   │   │   # means that all steps that get called while the entrypoint function          │
│    383 │   │   │   # is executed will be added as invocation to this pipeline instance.          │
│ ❱  384 │   │   │   self._call_entrypoint(*args, **kwargs)                                        │
│    385 │                                                                                         │
│    386 │   def register(self) -> "PipelineResponseModel":                                        │
│    387 │   │   """Register the pipeline in the server.                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/site-packages/zenml/new/pipelines/pipeline.py:1252 in _call_entrypoint │
│                                                                                                  │
│   1249 │   │   │   ) from e                                                                      │
│   1250 │   │                                                                                     │
│   1251 │   │   self._parameters = validated_args                                                 │
│ ❱ 1252 │   │   self.entrypoint(**validated_args)                                                 │
│   1253 │                                                                                         │
│   1254 │   def _prepare_if_possible(self) -> None:                                               │
│   1255 │   │   """Prepares the pipeline if possible.                                             │
│                                                                                                  │
│ /caa-debug/run__.py:13 in c                                                                      │
│                                                                                                  │
│   10                                                                                             │
│   11 @pipeline                                                                                   │
│   12 def c():                                                                                    │
│ ❱ 13   x, y = b()                                                                                │
│   14                                                                                             │
│   15                                                                                             │
│   16 c()                                                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: cannot unpack non-iterable StepArtifact object

Code of Conduct

schustmi commented 1 year ago

This is related to https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#type-annotations, in particular the section where we explain that it's impossible to detect whether you want to return a single artifact of type Tuple or multiple artifacts. Maybe we could add another separate type that allows to explicitly define multi-artifact outputs, or what would be your preferred solution here?

northanapon commented 1 year ago

I see. I missed the documentation during the migration from 0.39. I think the current approach works just fine. I just misunderstood it as a bug.

I currently use Annotated in my code. This might be treated as an explicit artifact declaration e.g., Tuple[Annotated[...], Annotated[...]].