zenml-io / zenml

ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
https://zenml.io
Apache License 2.0
3.84k stars 422 forks source link

[BUG]: GCS based Artifact Store connected through GCP Service Connector could not provide proper GCP service credential information to Label Studio annotator. #1853

Closed UltraRabbit closed 7 months ago

UltraRabbit commented 10 months ago

Contact Details [Optional]

jackhe1975@gmail.com

System Information

ZENML_LOCAL_VERSION: 0.44.3 ZENML_SERVER_VERSION: 0.44.3 ZENML_SERVER_DATABASE: sqlite ZENML_SERVER_DEPLOYMENT_TYPE: local ZENML_CONFIG_DIR: /home/jack/.config/zenml ZENML_LOCAL_STORE_DIR: /home/jack/.config/zenml/local_stores ZENML_SERVER_URL: http://127.0.0.1:8237 ZENML_ACTIVE_REPOSITORY_ROOT: /home/jack/workspace/zenml PYTHON_VERSION: 3.10.12 ENVIRONMENT: wsl SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '22.04'} ACTIVE_WORKSPACE: default ACTIVE_STACK: YoloStack ACTIVE_USER: default TELEMETRY_STATUS: enabled ANALYTICS_CLIENT_ID: 61048dc0-ee28-4c12-8e05-6c1ef8d592d2 ANALYTICS_USER_ID: dfcbcebb-e076-4b35-b318-cff2a634172d ANALYTICS_SERVER_ID: 61048dc0-ee28-4c12-8e05-6c1ef8d592d2 INTEGRATIONS: ['gcp', 'kaniko', 'kubeflow', 'kubernetes', 'label_studio', 'mlflow', 'pillow', 'pytorch', 'scipy', 'seldon', 'sklearn'] PACKAGES: {'fsspec': '2023.9.2', 'gcsfs': '2023.9.2', 'regex': '2023.8.8', 'certifi': '2023.7.22', 'tzdata': '2023.3', 'pytz': '2019.3', 'setuptools': '59.6.0', 'cryptography': '41.0.4', 'kubernetes': '25.3.0', 'gevent': '23.9.1', 'pip': '23.2.1', 'aiofiles': '23.2.1', 'packaging': '23.2', 'attrs': '23.1.0', 'azure-mgmt-resource': '23.0.1', 'gunicorn': '21.2.0', 'lit': '17.0.1', 'rich': '13.6.0', 'azure-storage-blob': '12.18.2', 'cuda-python': '12.2.0', 'pyarrow': '12.0.1', 'nvidia-cublas-cu11': '11.10.3.66', 'nvidia-cuda-cupti-cu11': '11.7.101', 'nvidia-cuda-nvrtc-cu11': '11.7.99', 'nvidia-cuda-runtime-cu11': '11.7.99', 'nvidia-nvtx-cu11': '11.7.91', 'nvidia-cusparse-cu11': '11.7.4.91', 'nvidia-cusolver-cu11': '11.4.0.1', 'websockets': '11.0.3', 'nvidia-cufft-cu11': '10.9.0.58', 'nvidia-curand-cu11': '10.2.10.91', 'pillow': '9.3.0', 'ipython': '8.16.0', 'nvidia-cudnn-cu11': '8.5.0.96', 'click': '8.1.3', 'ipywidgets': '8.1.1', 'launchdarkly-server-sdk': '7.5.0', 'importlib-metadata': '6.8.0', 'docker': '6.1.3', 'importlib-resources': '6.1.0', 'multidict': '6.0.4', 'pyyaml': '6.0.1', 'zope.interface': '6.0', 'traitlets': '5.10.1', 'psutil': '5.9.5', 'ujson': '5.8.0', 'cachetools': '5.3.1', 'decorator': '5.1.1', 'smmap': '5.0.1', 'redis': '5.0.1', 'bleach': '5.0.1', 'zope.event': '5.0', 'tqdm': '4.66.1', 'fonttools': '4.43.0', 'lxml': '4.9.3', 'rsa': '4.9', 'pexpect': '4.8.0', 'typing-extensions': '4.8.0', 'django-model-utils': '4.1.1', 'gitdb': '4.0.10', 'widgetsnbextension': '4.0.9', 'async-timeout': '4.0.3', 'ordered-set': '4.0.2', 'bcrypt': '4.0.1', 'anyio': '4.0.0', 'cmake': '3.27.6', 'protobuf': '3.20.3', 'google-cloud-build': '3.20.0', 'simplejson': '3.19.1', 'zipp': '3.17.0', 'djangorestframework': '3.13.1', 'filelock': '3.12.4', 'google-cloud-bigquery': '3.11.4', 'orjson': '3.8.14', 'aiohttp': '3.8.5', 'matplotlib': '3.8.0', 'asgiref': '3.7.2', 'nltk': '3.6.7', 'django-cors-headers': '3.6.0', 'markdown': '3.4.4', 'idna': '3.4', 'python-jose': '3.3.0', 'django': '3.2.16', 'oauthlib': '3.2.2', 'django-debug-toolbar': '3.2.1', 'threadpoolctl': '3.2.0', 'jsonschema': '3.2.0', 'gitpython': '3.1.37', 'jinja2': '3.1.2', 'networkx': '3.1', 'django-extensions': '3.1.0', 'prompt-toolkit': '3.0.39', 'jupyterlab-widgets': '3.0.9', 'cython': '3.0.2', 'uritemplate': '3.0.1', 'werkzeug': '3.0.0', 'markdown-it-py': '3.0.0', 'boto': '2.49.0', 'tritonclient': '2.37.0.9383150', 'google-cloud-container': '2.31.0', 'requests': '2.28.0', 'google-auth': '2.23.2', 'pycparser': '2.21', 'google-cloud-secret-manager': '2.16.4', 'pygments': '2.16.1', 'nvidia-nccl-cu11': '2.14.3', 'semver': '2.13.0', 'google-api-core': '2.12.0', 'google-cloud-scheduler': '2.11.1', 'google-cloud-storage': '2.11.0', 'psycopg2-binary': '2.9.6', 'python-dateutil': '2.8.1', 'pyjwt': '2.8.0', 'google-cloud-logging': '2.7.2', 'google-resumable-media': '2.6.0', 'mlflow': '2.6.0', 'django-rq': '2.5.1', 'pyparsing': '2.4.7', 'django-filter': '2.4.0', 'asttokens': '2.4.0', 'flask': '2.3.3', 'coreapi': '2.3.3', 'google-cloud-core': '2.3.2', 'termcolor': '2.3.0', 'cloudpickle': '2.2.1', 'rules': '2.2', 'django-rest-swagger': '2.2.0', 'user-agents': '2.2.0', 'markupsafe': '2.1.3', 'itsdangerous': '2.1.2', 'charset-normalizer': '2.0.12', 'python-json-logger': '2.0.4', 'greenlet': '2.0.2', 'geventhttpclient': '2.0.2', 'kafka-python': '2.0.2', 'executing': '2.0.0', 'triton': '2.0.0', 'grpcio': '1.59.0', 'googleapis-common-protos': '1.56.4', 'grpcio-status': '1.48.2', 'google-cloud-aiplatform': '1.33.1', 'sentry-sdk': '1.31.0', 'azure-core': '1.29.4', 'urllib3': '1.26.16', 'proto-plus': '1.22.3', 'numpy': '1.21.6', 'drf-yasg': '1.20.0', 'botocore': '1.19.63', 'boto3': '1.16.63', 'cffi': '1.16.0', 'six': '1.16.0', 'wrapt': '1.15.0', 'google-cloud-functions': '1.13.3', 'google-api-python-client': '1.12.11', 'django-storages': '1.12.3', 'torch': '1.12.1', 'sympy': '1.12', 'scipy': '1.11.3', 'python-rapidjson': '1.11', 'pydantic': '1.10.13', 'google-cloud-resource-manager': '1.10.4', 'rq': '1.10.1', 'yarl': '1.9.2', 'kfp': '1.8.22', 'shapely': '1.8.5.post1', 'kfp-server-api': '1.8.5', 'alembic': '1.8.1', 'distro': '1.8.0', 'passlib': '1.7.4', 'label-studio': '1.7.3', 'websocket-client': '1.6.3', 'blinker': '1.6.2', 'google-crc32c': '1.5.0', 'sqlalchemy': '1.4.41', 'kiwisolver': '1.4.5', 'appdirs': '1.4.4', 'absl-py': '1.4.0', 'frozenlist': '1.4.0', 'azure-mgmt-core': '1.4.0', 'mlserver': '1.3.5', 'mlserver-mlflow': '1.3.5', 'pandas': '1.3.5', 'openapi-codec': '1.3.2', 'joblib': '1.3.2', 'scikit-learn': '1.3.1', 'requests-oauthlib': '1.3.1', 'aiosignal': '1.3.1', 'mpmath': '1.3.0', 'sniffio': '1.3.0', 'deprecated': '1.2.14', 'mako': '1.2.4', 'querystring-parser': '1.2.4', 'expiringdict': '1.2.2', 'itypes': '1.2.0', 'azure-common': '1.1.28', 'exceptiongroup': '1.1.3', 'contourpy': '1.1.1', 'brotli': '1.1.0', 'google-cloud-appengine-logging': '1.1.0', 'google-auth-oauthlib': '1.1.0', 'pyrfc3339': '1.1', 'pymysql': '1.0.3', 'python-dotenv': '1.0.0', 'fastapi': '0.89.1', 'zenml': '0.44.3', 'wheel': '0.41.2', 'sqlalchemy-utils': '0.38.3', 'uvicorn': '0.23.2', 'starlette': '0.22.0', 'watchfiles': '0.20.0', 'pyrsistent': '0.19.3', 'httplib2': '0.19.1', 'jedi': '0.19.0', 'validators': '0.18.2', 'ecdsa': '0.18.0', 'ua-parser': '0.18.0', 'ruamel.yaml': '0.17.33', 'databricks-cli': '0.17.8', 'prometheus-client': '0.17.1', 'uvloop': '0.17.0', 'starlette-exporter': '0.16.0', 'docstring-parser': '0.15', 'h11': '0.14.0', 'torchvision': '0.13.1', 'grpc-google-iam-v1': '0.12.6', 'lockfile': '0.12.2', 'cycler': '0.12.0', 'django-annoying': '0.10.6', 'python-terraform': '0.10.1', 'requests-toolbelt': '0.10.1', 'jmespath': '0.10.0', 'drf-flex-fields': '0.9.5', 'tabulate': '0.9.0', 'typer': '0.9.0', 'parso': '0.8.3', 'aiokafka': '0.8.1', 'pickleshare': '0.7.5', 'defusedxml': '0.7.1', 'py-grpc-prometheus': '0.7.0', 'ptyprocess': '0.7.0', 'stack-data': '0.6.3', 'isodate': '0.6.1', 'httptools': '0.6.0', 'inflection': '0.5.1', 'webencodings': '0.5.1', 'fire': '0.5.0', 'pyasn1': '0.5.0', 'colorama': '0.4.6', 'sqlparse': '0.4.4', 'django-user-agents': '0.4.0', 'entrypoints': '0.4', 's3transfer': '0.3.7', 'attr': '0.3.1', 'drf-generators': '0.3.0', 'drf-dynamic-fields': '0.3.0', 'click-params': '0.3.0', 'pyasn1-modules': '0.3.0', 'wcwidth': '0.2.8', 'ruamel.yaml.clib': '0.2.7', 'pure-eval': '0.2.2', 'fastapi-utils': '0.2.1', 'backcall': '0.2.0', 'google-cloud-audit-log': '0.2.0', 'xmljson': '0.2.0', 'kfp-pipeline-spec': '0.1.16', 'htmlmin': '0.1.12', 'strip-hints': '0.1.10', 'matplotlib-inline': '0.1.6', 'comm': '0.1.4', 'boxing': '0.1.4', 'django-ranged-fileresponse': '0.1.2', 'mdurl': '0.1.2', 'google-auth-httplib2': '0.1.1', 'label-studio-converter': '0.0.51', 'label-studio-sdk': '0.0.24', 'sqlmodel': '0.0.8', 'python-multipart': '0.0.6', 'coreschema': '0.0.4', 'label-studio-tools': '0.0.3', 'sqlalchemy2-stubs': '0.0.2a35'} The @step decorator that you used to define your get_or_create_datasetstep is deprecated. Check out the 0.40.0 migration guide for more information on how to migrate your steps to the new syntax: https://docs.zenml.io/reference/migration-guide/migration-zero-forty The @step decorator that you used to define your get_labeled_datastep is deprecated. Check out the 0.40.0 migration guide for more information on how to migrate your steps to the new syntax: https://docs.zenml.io/reference/migration-guide/migration-zero-forty The @step decorator that you used to define your sync_new_data_to_label_studiostep is deprecated. Check out the 0.40.0 migration guide for more information on how to migrate your steps to the new syntax: https://docs.zenml.io/reference/migration-guide/migration-zero-forty

CURRENT STACK

Name: YoloStack ID: 4cdfda21-3558-4fba-b5b6-a04b8539b9d8 Shared: Yes User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

ORCHESTRATOR: default

Name: default ID: 456e5840-baf9-433d-b9f3-2e97420597b6 Type: orchestrator Flavor: local Configuration: {} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

ARTIFACT_STORE: gcs_store

Name: gcs_store ID: 6242eccf-1c1e-4e4d-851f-f84ebef83038 Type: artifact_store Flavor: gcp Configuration: {'authentication_secret': 'gcp_secret', 'path': 'gs://yolo-wildfire'} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

MODEL_DEPLOYER: mlflow_deployer

Name: mlflow_deployer ID: fc6b0b76-fe44-4578-a07b-78f4cd58e741 Type: model_deployer Flavor: mlflow Configuration: {'service_path': ''} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

EXPERIMENT_TRACKER: mlflow_experiment_tracker

Name: mlflow_experiment_tracker ID: 2908300f-56f7-4a44-8ea1-b4258ad82f5e Type: experiment_tracker Flavor: mlflow Configuration: {'experiment_name': 'wildfire', 'nested': False, 'tags': {}, 'tracking_uri': 'http://127.0.0.1:5000', 'tracking_username': '****', 'tracking_password': '****', 'tracking_token': '****', 'tracking_insecure_tls': False, 'databricks_host': ''} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

ANNOTATOR: label_studio

Name: label_studio ID: 9869f307-2b5f-48e8-a5c0-212909142f1b Type: annotator Flavor: label_studio Configuration: {'authentication_secret': 'label_studio_key', 'instance_url': 'http://localhost', 'port': 8093} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

MODEL_REGISTRY: mlflow_model_registry

Name: mlflow_model_registry ID: 47cbf75b-ece9-446e-94ee-16125d5a60e8 Type: model_registry Flavor: mlflow Configuration: {} Shared: No User: default / dfcbcebb-e076-4b35-b318-cff2a634172d Workspace: default / 689f2915-873d-413c-982b-4185be3bb05c

What happened?

Trying to run the examples/label_studio_annotation in local environment. I got artifact-store registered and connect through a gcp connector. After running the training pipeline successfully, I tried to run the inference pipeline as instructed by the guide line of this example. It reported an error for data_sync unable to write the credential as a JSON file. After deleted the artifact-store and registered a new one with authentication-secret specified with a secret containning raw credential JSON content, the inference pipeline finished successfully.

Hope the credential output of the GCP based artifact-store to be consistent while using a registered secret and a GCP service connector.

Reproduction steps

  1. Register a GCP artifact-store with/without specifying the authentication secret attribute.
  2. Register a GCP service connector with GCP credential JSON file path specified.
  3. Connect the artifact-store with the service connector and verify connection success.
  4. Run examples/label_studio_annotation training pipeline and finished without error.
  5. Run examples/label_studio_annotation inference pipeline, then got an error report. ... image

Relevant log output

Step data_syncer has started.
Failed to run step data_syncer.
Object of type Credentials is not JSON serializable
Traceback (most recent call last):
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 229, in launch
    self._run_step(
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 421, in _run_step
    self._run_step_without_step_operator(
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 497, in _run_step_without_step_operator
    runner.run(
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/orchestrators/step_runner.py", line 184, in run
    return_values = step_instance.call_entrypoint(
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/steps/base_step.py", line 588, in call_entrypoint
    return self.entrypoint(**validated_args)
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/integrations/label_studio/steps/label_studio_standard_steps.py", line 221, in sync_new_data_to_label_studio
    annotator.populate_artifact_store_parameters(
  File "/home/jack/workspace/zenml/.venv/lib/python3.10/site-packages/zenml/integrations/label_studio/annotators/label_studio_annotator.py", line 553, in populate_artifact_store_parameters
    f.write(json.dumps(gcp_credentials))
  File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Credentials is not JSON serializable
Pipeline run inference_pipeline-2023_10_02-12_44_27_462977 failed.

Code of Conduct

strickvl commented 7 months ago

This issue was fixed / addressed in https://github.com/zenml-io/zenml/pull/2010. Thanks for reporting it!