zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
4.05k stars 438 forks source link

[BUG]: {{date}} and {{time}} placeholders not being replaced #2573

Closed ruvilonix closed 7 months ago

ruvilonix commented 7 months ago

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.56.2 ZENML_SERVER_VERSION: 0.56.2 ZENML_SERVER_DATABASE: sqlite ZENML_SERVER_DEPLOYMENT_TYPE: local ZENML_CONFIG_DIR: /home/user/.config/zenml ZENML_LOCAL_STORE_DIR: /home/user/.config/zenml/local_stores ZENML_SERVER_URL: http://127.0.0.1:8237 ZENML_ACTIVE_REPOSITORY_ROOT: /home/user/Projects/zengame PYTHON_VERSION: 3.11.8 ENVIRONMENT: native SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '23.10'} ACTIVE_WORKSPACE: default ACTIVE_STACK: default ACTIVE_USER: default TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: e069c3a6-d28b-4f71-9656-ec12c6ab972e ANALYTICS_USER_ID: e68fec70-56c0-4812-bfe5-4be7632c9850 ANALYTICS_SERVER_ID: e069c3a6-d28b-4f71-9656-ec12c6ab972e INTEGRATIONS: ['bitbucket', 'kaniko', 'pillow', 'scipy', 'sklearn'] PACKAGES: {'babel': '2.11.0', 'brotli': '1.0.9', 'gitpython': '3.1.42', 'jinja2': '3.1.3', 'mako': '1.3.2', 'markupsafe': '2.1.3', 'pyjwt': '2.7.0', 'pymysql': '1.0.3', 'pyqt5': '5.15.10', 'pyqt5-sip': '12.13.0', 'pysocks': '1.7.1', 'pyyaml': '6.0.1', 'pygments': '2.15.1', 'qtpy': '2.4.1', 'sqlalchemy': '1.4.41', 'sqlalchemy-utils': '0.38.3', 'send2trash': '1.8.2', 'aiohttp': '3.9.3', 'aiosignal': '1.3.1', 'alembic': '1.8.1', 'anyio': '4.2.0', 'argon2-cffi': '21.3.0', 'argon2-cffi-bindings': '21.2.0', 'asttokens': '2.0.5', 'async-lru': '2.0.4', 'attrs': '23.1.0', 'azure-common': '1.1.28', 'azure-core': '1.30.1', 'azure-mgmt-core': '1.4.0', 'azure-mgmt-resource': '23.0.1', 'bcrypt': '4.0.1', 'beautifulsoup4': '4.12.2', 'bleach': '4.1.0', 'cachetools': '5.3.3', 'certifi': '2024.2.2', 'cffi': '1.16.0', 'chardet': '5.2.0', 'charset-normalizer': '2.0.4', 'click': '8.1.3', 'click-params': '0.3.0', 'cloudpickle': '2.2.1', 'comm': '0.2.1', 'contourpy': '1.2.0', 'cryptography': '42.0.5', 'cycler': '0.12.1', 'debugpy': '1.6.7', 'decorator': '5.1.1', 'defusedxml': '0.7.1', 'distro': '1.9.0', 'docker': '6.1.3', 'executing': '0.8.3', 'fastapi': '0.99.1', 'fastapi-utils': '0.2.1', 'fastjsonschema': '2.16.2', 'fonttools': '4.50.0', 'frozenlist': '1.4.1', 'gitdb': '4.0.11', 'greenlet': '3.0.3', 'h11': '0.14.0', 'httplib2': '0.19.1', 'httptools': '0.6.1', 'idna': '3.4', 'ipinfo': '5.0.1', 'ipykernel': '6.28.0', 'ipython': '8.20.0', 'ipywidgets': '8.1.2', 'isodate': '0.6.1', 'jedi': '0.18.1', 'joblib': '1.3.2', 'json5': '0.9.6', 'jsonschema': '4.19.2', 'jsonschema-specifications': '2023.7.1', 'jupyter': '1.0.0', 'jupyter-client': '8.6.0', 'jupyter-console': '6.6.3', 'jupyter-core': '5.5.0', 'jupyter-events': '0.8.0', 'jupyter-lsp': '2.2.0', 'jupyter-server': '2.10.0', 'jupyter-server-terminals': '0.4.4', 'jupyterlab': '4.0.11', 'jupyterlab-pygments': '0.1.2', 'jupyterlab-server': '2.25.1', 'jupyterlab-widgets': '3.0.10', 'kiwisolver': '1.4.5', 'markdown-it-py': '3.0.0', 'matplotlib': '3.8.3', 'matplotlib-inline': '0.1.6', 'mdurl': '0.1.2', 'mistune': '2.0.4', 'multidict': '6.0.5', 'nbclient': '0.8.0', 'nbconvert': '7.10.0', 'nbformat': '5.9.2', 'nest-asyncio': '1.6.0', 'notebook': '7.0.8', 'notebook-shim': '0.2.3', 'numpy': '1.26.4', 'orjson': '3.8.14', 'overrides': '7.4.0', 'packaging': '23.2', 'pandas': '2.2.1', 'pandocfilters': '1.5.0', 'parso': '0.8.3', 'passlib': '1.7.4', 'pexpect': '4.8.0', 'pillow': '10.2.0', 'pip': '23.3.1', 'platformdirs': '3.10.0', 'ply': '3.11', 'prometheus-client': '0.14.1', 'prompt-toolkit': '3.0.43', 'psutil': '5.9.0', 'ptyprocess': '0.7.0', 'pure-eval': '0.2.2', 'pycparser': '2.21', 'pydantic': '1.10.14', 'pyparsing': '2.4.7', 'python-dateutil': '2.8.2', 'python-dotenv': '1.0.1', 'python-json-logger': '2.0.7', 'python-multipart': '0.0.9', 'pytz': '2023.3.post1', 'pyzmq': '25.1.2', 'qtconsole': '5.5.1', 'referencing': '0.30.2', 'requests': '2.31.0', 'rfc3339-validator': '0.1.4', 'rfc3986-validator': '0.1.1', 'rich': '13.7.1', 'rpds-py': '0.10.6', 'scikit-learn': '1.4.1.post1', 'scipy': '1.12.0', 'setuptools': '68.2.2', 'sip': '6.7.12', 'six': '1.16.0', 'smmap': '5.0.1', 'sniffio': '1.3.0', 'soupsieve': '2.5', 'sqlalchemy2-stubs': '0.0.2a38', 'sqlmodel': '0.0.8', 'stack-data': '0.2.0', 'starlette': '0.27.0', 'terminado': '0.17.1', 'threadpoolctl': '3.4.0', 'tinycss2': '1.2.1', 'tornado': '6.3.3', 'traitlets': '5.14.2', 'typing-extensions': '4.10.0', 'tzdata': '2024.1', 'urllib3': '2.1.0', 'uvicorn': '0.29.0', 'uvloop': '0.19.0', 'validators': '0.18.2', 'watchfiles': '0.21.0', 'wcwidth': '0.2.13', 'webencodings': '0.5.1', 'websocket-client': '0.58.0', 'websockets': '12.0', 'wheel': '0.41.2', 'widgetsnbextension': '4.0.10', 'yarl': '1.9.4', 'zenml': '0.56.2'}

CURRENT STACK

Name: default ID: c74cf215-1952-485e-a0b7-355c019db547 Workspace: default / 6be62cdc-7679-49e0-9875-9b01853c76e1

ORCHESTRATOR: default

Name: default ID: 51daa559-66b7-4f3f-93fc-02c6a1351194 Type: orchestrator Flavor: local Configuration: {} Workspace: default / 6be62cdc-7679-49e0-9875-9b01853c76e1

ARTIFACT_STORE: default

Name: default ID: eda693cf-ee20-4546-8e01-f4ecab219e90 Type: artifact_store Flavor: local Configuration: {'path': ''}

What happened?

I'm getting started with the starter guide. I added the date and time placeholders to run_name in pipeline.with_options, but it didn't replace the text.

Reproduction steps

  1. Install zenml then zenml[server] in a mamba environment.
  2. Install jupyter
  3. Install chardet because Jupyter said it was missing.
  4. Run zenml init. (I don't know what this is for yet. The starter guide didn't say to use it, but I tried to originally run the code in Jupyter Lab, and it said I had to run that first.
  5. Create the following python file:
from typing_extensions import Annotated  # or `from typing import Annotated on Python 3.9+
from typing import Tuple
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC

from zenml import pipeline, step

import logging

@step
def training_data_loader() -> Tuple[
    # Notice we use a Tuple and Annotated to return 
    # multiple named outputs
    Annotated[pd.DataFrame, "X_train"],
    Annotated[pd.DataFrame, "X_test"],
    Annotated[pd.Series, "y_train"],
    Annotated[pd.Series, "y_test"],
]:
    """Load the iris dataset as a tuple of Pandas DataFrame / Series."""
    logging.info("Loading iris...")
    iris = load_iris(as_frame=True)
    logging.info("Splitting train and test...")
    X_train, X_test, y_train, y_test = train_test_split(
        iris.data, iris.target, test_size=0.2, shuffle=True, random_state=42
    )
    return X_train, X_test, y_train, y_test

@step
def svc_trainer(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    gamma: float = 0.001,
) -> Tuple[
    Annotated[ClassifierMixin, "trained_model"],
    Annotated[float, "training_acc"],
]:
    """Train a sklearn SVC classifier."""

    model = SVC(gamma=gamma)
    model.fit(X_train.to_numpy(), y_train.to_numpy())

    train_acc = model.score(X_train.to_numpy(), y_train.to_numpy())
    print(f"Train accuracy: {train_acc}")

    return model, train_acc

@pipeline
def training_pipeline(gamma: float = 0.002):
    X_train, X_test, y_train, y_test = training_data_loader()
    svc_trainer(gamma=gamma, X_train=X_train, y_train=y_train)

if __name__ == "__main__":
    # Configure the pipeline
    training_pipeline = training_pipeline.with_options(
        config_path='./config.yaml',
        run_name="iris-{{date}}_{{time}}"
    )
    # Run the pipeline
    training_pipeline()
  1. Run the file from command line.
  2. Run zenml show.
  3. Run name shows iris-{date}_{time}.

Relevant log output

Initiating a new run for the pipeline: training_pipeline.
Registered new version: (version 2).
Executing a new run.
Using user: default
Using stack: default
  orchestrator: default
  artifact_store: default
Dashboard URL: http://127.0.0.1:8237/workspaces/default/pipelines/976a5dd4-54ba-4c19-a5e0-7ae156e6406c/runs/07a94860-62aa-43a4-ac86-6738ce9214a2/dag
Using cached version of training_data_loader.
Step training_data_loader has started.
Step svc_trainer has started.
By default, the PandasMaterializer stores data as a .csv file. If you want to store data more efficiently, you can install pyarrow by running 'pip install pyarrow'. This will allow PandasMaterializer to automatically store the data as a .parquet file instead.
/home/user/miniconda3/envs/zengame/lib/python3.11/site-packages/zenml/materializers/pandas_materializer.py:95: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df = pd.read_csv(f, index_col=0, parse_dates=True)
By default, the PandasMaterializer stores data as a .csv file. If you want to store data more efficiently, you can install pyarrow by running 'pip install pyarrow'. This will allow PandasMaterializer to automatically store the data as a .parquet file instead.
/home/user/miniconda3/envs/zengame/lib/python3.11/site-packages/zenml/materializers/pandas_materializer.py:95: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df = pd.read_csv(f, index_col=0, parse_dates=True)
Train accuracy: 0.925
Step svc_trainer has finished in 0.630s.
Pipeline run has finished in 0.764s.

Code of Conduct

strickvl commented 7 months ago

Thanks for submitting this issue. I'll be taking a look at it now.

strickvl commented 7 months ago

The fix is trivial. I apologise that our docs didn't show the right way:

run_name=f"iris-{{date}}_{{time}}"

You need to make sure that the f is there to denote that it's a formatted / template string whose variables can be updated. Then it'll work.

I'll make an update to the docs for this. Thanks for drawing it to our attention!