zenml-io / zenml

ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
https://zenml.io
Apache License 2.0
3.84k stars 422 forks source link

[BUG]: S3 artifact store registration with single quotes produces SUPPORTED_SCHEMES error #1385

Closed christianversloot closed 1 year ago

christianversloot commented 1 year ago

Contact Details [Optional]

c.versloot@infoplaza.nl

System Information

ZENML_LOCAL_VERSION: 0.34.0 ZENML_SERVER_VERSION: 0.34.0 ZENML_SERVER_DATABASE: mysql ZENML_SERVER_DEPLOYMENT_TYPE: other ZENML_CONFIG_DIR: C:\Users\Christian Versloot\AppData\Roaming\zenml ZENML_LOCAL_STORE_DIR: C:\Users\Christian Versloot\AppData\Roaming\zenml\local_stores ZENML_SERVER_URL: removed ZENML_ACTIVE_REPOSITORY_ROOT: None PYTHON_VERSION: 3.9.9 ENVIRONMENT: native SYSTEM_INFO: {'os': 'windows', 'windows_version_release': '10', 'windows_version': '10.0.19044', 'windows_version_service_pack': 'SP0', 'windows_version_os_type': 'Multiprocessor Free'} ACTIVE_WORKSPACE: default ACTIVE_STACK: default ACTIVE_USER: removed TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: 07bfbb21-15fa-4f7c-a6ea-27dee28902d9 ANALYTICS_USER_ID: e96e7521-73ed-4ff1-8aca-72d5b224a55e ANALYTICS_SERVER_ID: 9ed78563-2c57-4b86-9a6e-467ee05cc1c5 INTEGRATIONS: ['aws', 'github', 'kaniko', 'mlflow', 's3', 'scipy', 'sklearn'] PACKAGES: {'fsspec': '2022.11.0', 's3fs': '2022.11.0', 'certifi': '2021.10.8', 'pytz': '2021.3', 'pywin32': '305', 'setuptools': '60.1.0', 'cryptography': '36.0.1', 'kubernetes': '26.1.0', 'pyzmq': '25.0.0', 'sanic': '22.12.0', 'sanic-ext': '22.12.0', 'sanic-routing': '22.8.0', 'cattrs': '22.2.0', 'pycountry': '22.1.10', 'attrs': '22.1.0', 'gevent': '21.12.0', 'contextlib2': '21.6.0', 'argon2-cffi': '21.3.0', 'packaging': '21.3', 'azure-mgmt-resource': '21.2.1', 'argon2-cffi-bindings': '21.2.0', 'pyopenssl': '21.0.0', 'virtualenv': '20.19.0', 'clickclick': '20.10.2', 'pip': '20.3.4', 'azure-mgmt-storage': '20.1.0', 'gunicorn': '20.1.0', 'azure-storage-blob': '12.11.0', 'rich': '12.6.0', 'azure-mgmt-containerregistry': '10.1.0', 'websockets': '10.1', 'humanfriendly': '10.0', 'ipython': '8.11.0', 'pillow': '8.3.2', 'pyee': '8.2.2', 'click': '8.0.3', 'jupyter-client': '8.0.3', 'pyarrow': '8.0.0', 'ipywidgets': '7.7.3', 'nbconvert': '7.2.9', 'ipykernel': '6.21.2', 'notebook': '6.5.2', 'tornado': '6.2', 'multidict': '6.0.2', 'docker': '6.0.1', 'bleach': '6.0.0', 'pbr': '5.11.0', 'psutil': '5.9.0', 'traitlets': '5.9.0', 'nbformat': '5.7.3', 'pyyaml': '5.4.1', 'zope.interface': '5.4.0', 'cachetools': '5.3.0', 'jupyter-core': '5.2.0', 'decorator': '5.1.1', 'configobj': '5.0.6', 'azure-mgmt-redis': '5.0.0', 'smmap': '5.0.0', 'tqdm': '4.62.3', 'importlib-metadata': '4.11.3', 'conda': '4.11.0', 'beautifulsoup4': '4.10.0', 'rsa': '4.9', 'lxml': '4.8.0', 'opencv-python': '4.7.0.68', 'azure-mgmt-compute': '4.6.2', 'zope.event': '4.5.0', 'redis': '4.4.2', 'azure-mgmt-containerservice': '4.4.0', 'typing-extensions': '4.4.0', 'pyodbc': '4.0.32', 'gitdb': '4.0.10', 'mock': '4.0.3', 'async-timeout': '4.0.2', 'chardet': '4.0.0', 'progressbar2': '4.0.0', 'atlassian-python-api': '3.33.0', 'protobuf': '3.20.3', 'marshmallow': '3.19.0', 'ply': '3.11', 'filelock': '3.9.0', 'orjson': '3.8.5', 'aiohttp': '3.8.1', 'zipp': '3.8.0', 'aiofile': '3.7.4', 'widgetsnbextension': '3.6.2', 'tables': '3.6.1', 'gremlinpython': '3.6.1', 'constructs': '3.4.246', 'asyncio': '3.4.3', 'matplotlib': '3.4.3', 'asgiref': '3.4.1', 'markdown': '3.4.1', 'pyreadline3': '3.4.1', 'anyio': '3.4.0', 'h5py': '3.4.0', 'python-utils': '3.3.3', 'secretstorage': '3.3.3', 'aws-parallelcluster': '3.3.0', 'bcrypt': '3.2.2', 'pyproj': '3.2.1', 'jsonschema': '3.2.0', 'oauthlib': '3.2.0', 'wget': '3.2', 'gitpython': '3.1.31', 'aenum': '3.1.11', 'graphql-core': '3.1.7', 'idna': '3.1', 'threadpoolctl': '3.1.0', 'prompt-toolkit': '3.0.38', 'flask-cors': '3.0.10', 'openpyxl': '3.0.9', 'jinja2': '3.0.3', 'azure-mgmt-nspkg': '3.0.2', 'azure-nspkg': '3.0.2', 'azure-mgmt-datalake-nspkg': '3.0.1', 'azure-mgmt-authorization': '3.0.0', 'azure-mgmt-logic': '3.0.0', 'geojson': '3.0.0', 'platformdirs': '3.0.0', 'sagemaker': '2.117.0', 'tritonclient': '2.30.0', 'requests': '2.26.0', 'imageio': '2.24.0', 'pycparser': '2.21', 'awswrangler': '2.17.0', 'fastjsonschema': '2.16.3', 'google-auth': '2.16.1', 'pygments': '2.14.0', 'typeguard': '2.13.3', 'connexion': '2.13.1', 'google-api-core': '2.11.0', 'paramiko': '2.11.0', 'psycopg2': '2.9.3', 'networkx': '2.8.8', 'python-dateutil': '2.8.2', 'numexpr': '2.8.1', 'azure-mgmt-network': '2.7.0', 'google-cloud-storage': '2.7.0', 'portalocker': '2.7.0', 'azure-mgmt-eventhub': '2.6.0', 'locust': '2.5.1', 'pyparsing': '2.4.7', 'aiobotocore': '2.4.2', 'google-resumable-media': '2.4.1', 'google-cloud-core': '2.3.2', 'soupsieve': '2.3.2', 'pyshp': '2.3.1', 'jsonpointer': '2.3', 'jupyter-server': '2.3.0', 'pyjwt': '2.3.0', 'asttokens': '2.2.1', 'cloudpickle': '2.2.1', 'azure-mgmt-devtestlabs': '2.2.0', 'geopy': '2.2.0', 'jsonpickle': '2.2.0', 'waitress': '2.1.2', 'mlflow': '2.1.1', 'azure-mgmt-dns': '2.1.0', 'azure-mgmt-notificationhubs': '2.1.0', 'azure-mgmt-search': '2.1.0', 'redshift-connector': '2.0.909', 'pywinpty': '2.0.10', 'charset-normalizer': '2.0.9', 'python-json-logger': '2.0.7', 'mistune': '2.0.5', 'flask': '2.0.2', 'kafka-python': '2.0.2', 'werkzeug': '2.0.2', 'itsdangerous': '2.0.1', 'markupsafe': '2.0.1', 'shapely': '2.0.1', 'argcomplete': '2.0.0', 'azure-mgmt-consumption': '2.0.0', 'azure-mgmt-powerbiembedded': '2.0.0', 'azure-mgmt-scheduler': '2.0.0', 'opensearch-py': '2.0.0', 'aws-cdk.assets': '1.192.0', 'aws-cdk.aws-acmpca': '1.192.0', 'aws-cdk.aws-apigateway': '1.192.0', 'aws-cdk.aws-applicationautoscaling': '1.192.0', 'aws-cdk.aws-autoscaling': '1.192.0', 'aws-cdk.aws-autoscaling-common': '1.192.0', 'aws-cdk.aws-autoscaling-hooktargets': '1.192.0', 'aws-cdk.aws-batch': '1.192.0', 'aws-cdk.aws-certificatemanager': '1.192.0', 'aws-cdk.aws-cloudformation': '1.192.0', 'aws-cdk.aws-cloudfront': '1.192.0', 'aws-cdk.aws-cloudwatch': '1.192.0', 'aws-cdk.aws-codebuild': '1.192.0', 'aws-cdk.aws-codecommit': '1.192.0', 'aws-cdk.aws-codeguruprofiler': '1.192.0', 'aws-cdk.aws-codestarnotifications': '1.192.0', 'aws-cdk.aws-cognito': '1.192.0', 'aws-cdk.aws-dynamodb': '1.192.0', 'aws-cdk.aws-ec2': '1.192.0', 'aws-cdk.aws-ecr': '1.192.0', 'aws-cdk.aws-ecr-assets': '1.192.0', 'aws-cdk.aws-ecs': '1.192.0', 'aws-cdk.aws-efs': '1.192.0', 'aws-cdk.aws-elasticloadbalancing': '1.192.0', 'aws-cdk.aws-elasticloadbalancingv2': '1.192.0', 'aws-cdk.aws-events': '1.192.0', 'aws-cdk.aws-fsx': '1.192.0', 'aws-cdk.aws-globalaccelerator': '1.192.0', 'aws-cdk.aws-iam': '1.192.0', 'aws-cdk.aws-imagebuilder': '1.192.0', 'aws-cdk.aws-kinesis': '1.192.0', 'aws-cdk.aws-kms': '1.192.0', 'aws-cdk.aws-lambda': '1.192.0', 'aws-cdk.aws-logs': '1.192.0', 'aws-cdk.aws-route53': '1.192.0', 'aws-cdk.aws-route53-targets': '1.192.0', 'aws-cdk.aws-s3': '1.192.0', 'aws-cdk.aws-s3-assets': '1.192.0', 'aws-cdk.aws-sam': '1.192.0', 'aws-cdk.aws-secretsmanager': '1.192.0', 'aws-cdk.aws-servicediscovery': '1.192.0', 'aws-cdk.aws-signer': '1.192.0', 'aws-cdk.aws-sns': '1.192.0', 'aws-cdk.aws-sns-subscriptions': '1.192.0', 'aws-cdk.aws-sqs': '1.192.0', 'aws-cdk.aws-ssm': '1.192.0', 'aws-cdk.aws-stepfunctions': '1.192.0', 'aws-cdk.cloud-assembly-schema': '1.192.0', 'aws-cdk.core': '1.192.0', 'aws-cdk.custom-resources': '1.192.0', 'aws-cdk.cx-api': '1.192.0', 'aws-cdk.region-info': '1.192.0', 'jsii': '1.75.0', 'googleapis-common-protos': '1.58.0', 'aws-sam-translator': '1.55.0', 'geographiclib': '1.52', 'grpcio': '1.51.1', 'azureml-core': '1.49.0', 'jsonpatch': '1.32', 'pg8000': '1.29.3', 'botocore': '1.27.59', 'boto3': '1.26.7', 'urllib3': '1.26.7', 'azure-core': '1.26.3', 'pyresample': '1.26.1', 'georaster': '1.25', 'numpy': '1.22.4', 'w3lib': '1.22.0', 'msal': '1.21.0', 'parse': '1.19.0', 'six': '1.16.0', 'cffi': '1.15.0', 'wrapt': '1.13.3', 'pydantic': '1.10.4', 'backoff': '1.10.0', 'pkginfo': '1.9.6', 'azure-mgmt-rdbms': '1.9.0', 'junit-xml': '1.9', 'python-rapidjson': '1.9', 'alembic': '1.8.1', 'yarl': '1.8.1', 'distro': '1.8.0', 'send2trash': '1.8.0', 'ppft': '1.7.6.6', 'passlib': '1.7.4', 'conda-package-handling': '1.7.3', 'scipy': '1.7.2', 'pysocks': '1.7.1', 'debugpy': '1.6.6', 'cftime': '1.6.2', 'netcdf4': '1.6.2', 'monotonic': '1.6', 'nest-asyncio': '1.5.6', 'configargparse': '1.5.3', 'geventhttpclient': '1.5.3', 'jsonpath-ng': '1.5.3', 'pandas': '1.5.2', 'asn1crypto': '1.5.1', 'azure-mgmt-containerinstance': '1.5.0', 'google-crc32c': '1.5.0', 'pandocfilters': '1.5.0', 'pynacl': '1.5.0', 'sqlalchemy': '1.4.40', 'menuinst': '1.4.18', 'appdirs': '1.4.4', 'scramp': '1.4.4', 'pyquery': '1.4.3', 'azure-storage-common': '1.4.2', 'analytics-python': '1.4.0', 'basemap': '1.3.4', 'pykdtree': '1.3.4', 'frozenlist': '1.3.3', 'azure-mgmt-core': '1.3.2', 'basemap-data': '1.3.2', 'basemap-data-hires': '1.3.2', 'kiwisolver': '1.3.2', 'aiosignal': '1.3.1', 'requests-oauthlib': '1.3.1', 'mlserver': '1.3.0.dev2', 'mlserver-mlflow': '1.3.0.dev2', 'htmlgenerator': '1.2.27', 'deprecated': '1.2.13', 'adal': '1.2.7', 'mako': '1.2.4', 'querystring-parser': '1.2.4', 'jschema-to-python': '1.2.3', 'websocket-client': '1.2.2', 'tinycss2': '1.2.1', 'pandas-stubs': '1.2.0.57', 'executing': '1.2.0', 'joblib': '1.2.0', 'scikit-learn': '1.2.0', 'sniffio': '1.2.0', 'azure-common': '1.1.28', 'greenlet': '1.1.2', 'jupyterlab-widgets': '1.1.2', 'requests-aws4auth': '1.1.2', 'azure-mgmt-keyvault': '1.1.0', 'cssselect': '1.1.0', 'et-xmlfile': '1.1.0', 'exceptiongroup': '1.1.0', 'openapi': '1.1.0', 'win-inet-pton': '1.1.0', 'brotli': '1.0.9', 'sarif-om': '1.0.4', 'msgpack': '1.0.3', 'pymysql': '1.0.2', 'pyppeteer': '1.0.2', 'smdebug-rulesconfig': '1.0.1', 'backports.weakref': '1.0.post1', 'azure-mgmt-datamigration': '1.0.0', 'azure-mgmt-eventgrid': '1.0.0', 'azure-mgmt-media': '1.0.0', 'backports.tempfile': '1.0', 'msal-extensions': '1.0.0', 'cfn-lint': '0.72.6', 'multiprocess': '0.70.14', 'fastapi': '0.70.0', 'azure-graphrbac': '0.61.1', 'numba': '0.56.4', 'azure-mgmt-trafficmanager': '0.50.0', 'shap': '0.41.0', 'llvmlite': '0.39.1', 'sqlalchemy-utils': '0.38.3', 'wheel': '0.37.0', 'azure-mgmt-web': '0.35.0', 'zenml': '0.34.0', 'cython': '0.29.33', 'prometheus-flask-exporter': '0.22.0', 'python-dotenv': '0.21.1', 'httplib2': '0.19.1', 'jedi': '0.18.2', 'validators': '0.18.2', 'pyrsistent': '0.18.1', 'uvicorn': '0.17.6', 'databricks-cli': '0.17.4', 'terminado': '0.17.1', 'openai': '0.16.0', 'prometheus-client': '0.16.0', 'starlette': '0.16.0', 'ruamel-yaml-conda': '0.15.80', 'starlette-exporter': '0.15.1', 'ariadne': '0.14.0', 'h11': '0.12.0', 'aioitertools': '0.11.0', 'pathspec': '0.11.0', 'knack': '0.10.1', 'python-terraform': '0.10.1', 'cycler': '0.10.0', 'jmespath': '0.10.0', 'requests-html': '0.10.0', 'caio': '0.9.3', 'azure-mgmt-sql': '0.9.1', 'commonmark': '0.9.1', 'tabulate': '0.8.10', 'parso': '0.8.3', 'aiofiles': '0.8.0', 'aiokafka': '0.8.0', 'jeepney': '0.8.0', 'pickleshare': '0.7.5', 'schema': '0.7.5', 'nbclient': '0.7.2', 'defusedxml': '0.7.1', 'msrest': '0.7.1', 'brotlipy': '0.7.0', 'py-grpc-prometheus': '0.7.0', 'requests-auth-aws-sigv4': '0.7', 'watchgod': '0.7', 'msrestazure': '0.6.4', 'jupyter-events': '0.6.3', 'pycosat': '0.6.3', 'stack-data': '0.6.2', 'isodate': '0.6.1', 'azure-mgmt-datafactory': '0.6.0', 'azure-mgmt-datalake-analytics': '0.6.0', 's3transfer': '0.6.0', 'azure-mgmt-servicebus': '0.5.3', 'azure-mgmt-monitor': '0.5.2', 'nbclassic': '0.5.2', 'inflection': '0.5.1', 'ndg-httpsclient': '0.5.1', 'webencodings': '0.5.1', 'azure-mgmt-datalake-store': '0.5.0', 'azure-mgmt-iothub': '0.5.0', 'pyasn1': '0.4.8', 'colorama': '0.4.4', 'jupyter-server-terminals': '0.4.4', 'python-graphql-client': '0.4.3', 'sqlparse': '0.4.3', 'azure-mgmt-cosmosdb': '0.4.1', 'azure-mgmt-machinelearningcompute': '0.4.1', 'entrypoints': '0.4', 'parallel-pandas': '0.4.0', 'dill': '0.3.6', 'distlib': '0.3.6', 'apng': '0.3.4', 'pox': '0.3.2', 'azure-mgmt-recoveryservices': '0.3.0', 'azure-mgmt-recoveryservicesbackup': '0.3.0', 'click-params': '0.3.0', 'httptools': '0.3.0', 'pathos': '0.3.0', 'pysftp': '0.2.9', 'pyasn1-modules': '0.2.8', 'wcwidth': '0.2.6', 'jupyterlab-pygments': '0.2.2', 'notebook-shim': '0.2.2', 'pure-eval': '0.2.2', 'azure-mgmt-reservations': '0.2.1', 'wincertstore': '0.2', 'azure-mgmt-iothubprovisioningservices': '0.2.0', 'azure-mgmt-loganalytics': '0.2.0', 'azure-mgmt-msi': '0.2.0', 'azure-mgmt-servicefabric': '0.2.0', 'azure-mgmt-subscription': '0.2.0', 'backcall': '0.2.0', 'flask-basicauth': '0.2.0', 'google-pasta': '0.2.0', 'ipython-genutils': '0.2.0', 'fake-useragent': '0.1.11', 'bcpy': '0.1.8', 'matplotlib-inline': '0.1.6', 'protobuf3-to-dict': '0.1.5', 'rfc3339-validator': '0.1.4', 'comm': '0.1.2', 'azure-mgmt-hanaonazure': '0.1.1', 'azure-mgmt-managementpartner': '0.1.1', 'azure-mgmt-signalr': '0.1.1', 'rfc3986-validator': '0.1.1', 'azure-mgmt-devspaces': '0.1.0', 'azure-mgmt-iotcentral': '0.1.0', 'azure-mgmt-managementgroups': '0.1.0', 'azure-mgmt-maps': '0.1.0', 'azure-mgmt-marketplaceordering': '0.1.0', 'azure-mgmt-policyinsights': '0.1.0', 'azure-mgmt-relay': '0.1.0', 'sqlmodel': '0.0.8', 'slicer': '0.0.7', 'publication': '0.0.3', 'roundrobin': '0.0.2', 'sqlalchemy2-stubs': '0.0.2a32', 'bs4': '0.0.1', 'expertmodels': '0.0.4', 'weathercomputing': '0.0.6'}

What happened?

When adding an artifact store in the following way...

zenml artifact-store register infoplaza-ml-artifacts -f s3 --path='s3://bucketname' --authentication_secret=aws_s3_artifact_store

...the following error is produced:

                                                                                                  │
│ c:\users\public\miniconda\lib\site-packages\zenml\artifact_stores\base_artifact_store.py:157 in  │
│ _ensure_artifact_store                                                                           │
│                                                                                                  │
│   154 │   │   if not any(                                                                        │
│   155 │   │   │   values["path"].startswith(i) for i in cls.SUPPORTED_SCHEMES                    │
│   156 │   │   ):                                                                                 │
│ > 157 │   │   │   raise ArtifactStoreInterfaceError(                                             │
│   158 │   │   │   │   f"The path: '{values['path']}' you defined for your "                      │
│   159 │   │   │   │   f"artifact store is not supported by the implementation of "               │
│   160 │   │   │   │   f"{cls.schema()['title']}, because it does not start with "                │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
ArtifactStoreInterfaceError: The path: ''s3://bucketname'' you defined for your artifact store is not supported by the implementation of S3ArtifactStoreConfig, because it does not start with one of its supported
schemes: {'s3://'}.

Removing the single quotes allows me to successfully register the artifact store.

zenml artifact-store register infoplaza-ml-artifacts -f s3 --path=s3://bucketname --authentication_secret=aws_s3_artifact_store

The docs contain the first method, i.e. with single quotes, which may confuse users: https://docs.zenml.io/component-gallery/artifact-stores/s3

A solution would either:

  1. Add the single quotes to SUPPORTED_SCHEMES or ignore it when matching, or
  2. Adapt the docs to the working command.

In my view (1) would be the preferred option to allow users most flexibility.

Reproduction steps

  1. zenml artifact-store register infoplaza-ml-artifacts -f s3 --path='s3://bucketname' --authentication_secret=aws_s3_artifact_store

Relevant log output

No response

Code of Conduct

schustmi commented 1 year ago

Thanks for reporting this bug @christianversloot, I definitely agree your option 1, specifically ignoring the quotes should be the way this should be implemented.

schustmi commented 1 year ago

Fixed in 0.36.0