zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.92k stars 427 forks source link

[BUG]: GCP service account connector idles if account is already authenticated #2388

Closed mcschmitz closed 7 months ago

mcschmitz commented 7 months ago

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.54.1
ZENML_SERVER_VERSION: 0.54.1
ZENML_SERVER_DATABASE: sqlite
ZENML_SERVER_DEPLOYMENT_TYPE: other
ZENML_CONFIG_DIR: /home/manuel/.config/zenml
ZENML_LOCAL_STORE_DIR: /home/manuel/.config/zenml/local_stores
ZENML_SERVER_URL: sqlite:////home/manuel/.config/zenml/local_stores/default_zen_store/zenml.db
ZENML_ACTIVE_REPOSITORY_ROOT: /home/manuel/code/fine-tune-mistral
PYTHON_VERSION: 3.10.13
ENVIRONMENT: native
SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '20.04'}
ACTIVE_WORKSPACE: default
ACTIVE_STACK: gcp
ACTIVE_USER: default
TELEMETRY_STATUS: enabled
ANALYTICS_CLIENT_ID: d7214a26-e1ee-4c8c-833f-225608d2bb0c
ANALYTICS_USER_ID: 061bdf54-9b2d-471c-aac3-f18f185bb930
ANALYTICS_SERVER_ID: d7214a26-e1ee-4c8c-833f-225608d2bb0c
INTEGRATIONS: ['gcp', 'github', 'kaniko', 'kubeflow', 'kubernetes', 'seldon', 'skypilot_gcp', 'skypilot_aws', 'skypilot_azure']
PACKAGES: {'deprecated': '1.2.14', 'gitpython': '3.1.41', 'jinja2': '3.1.3', 'mako': '1.3.2', 'markupsafe': '2.1.4', 'pulp': '2.8.0', 'pygithub': '1.59.1', 'pyjwt': '2.7.0', 'pymysql': '1.0.3', 'pynacl': '1.5.0', 'pysocks': '1.7.1', 'pyyaml': '6.0.1', 'sqlalchemy': 
'1.4.41', 'sqlalchemy-utils': '0.38.3', 'secretstorage': '3.3.3', 'absl-py': '1.4.0', 'adal': '1.2.7', 'aiohttp': '3.9.3', 'aiohttp-cors': '0.7.0', 'aiosignal': '1.3.1', 'alembic': '1.8.1', 'antlr4-python3-runtime': '4.13.1', 'anyio': '4.2.0', 'appdirs': '1.4.4', 
'applicationinsights': '0.11.10', 'argcomplete': '3.1.6', 'asttokens': '2.4.1', 'async-timeout': '4.0.3', 'attrs': '23.2.0', 'awscli': '1.32.32', 'azure-appconfiguration': '1.1.1', 'azure-batch': '14.0.0', 'azure-cli': '2.56.0', 'azure-cli-core': '2.56.0', 
'azure-cli-telemetry': '1.1.0', 'azure-common': '1.1.28', 'azure-core': '1.29.7', 'azure-cosmos': '3.2.0', 'azure-data-tables': '12.4.0', 'azure-datalake-store': '0.0.53', 'azure-graphrbac': '0.60.0', 'azure-identity': '1.14.1', 'azure-keyvault-administration': '4.4.0b2', 
'azure-keyvault-certificates': '4.7.0', 'azure-keyvault-keys': '4.9.0b3', 'azure-keyvault-secrets': '4.7.0', 'azure-loganalytics': '0.1.1', 'azure-mgmt-advisor': '9.0.0', 'azure-mgmt-apimanagement': '4.0.0', 'azure-mgmt-appconfiguration': '3.0.0', 
'azure-mgmt-appcontainers': '2.0.0', 'azure-mgmt-applicationinsights': '1.0.0', 'azure-mgmt-authorization': '4.0.0', 'azure-mgmt-batch': '17.0.0', 'azure-mgmt-batchai': '7.0.0b1', 'azure-mgmt-billing': '6.0.0', 'azure-mgmt-botservice': '2.0.0', 'azure-mgmt-cdn': '12.0.0', 
'azure-mgmt-cognitiveservices': '13.5.0', 'azure-mgmt-compute': '30.4.0', 'azure-mgmt-containerinstance': '10.1.0', 'azure-mgmt-containerregistry': '10.1.0', 'azure-mgmt-containerservice': '28.0.0', 'azure-mgmt-core': '1.4.0', 'azure-mgmt-cosmosdb': '9.4.0', 
'azure-mgmt-databoxedge': '1.0.0', 'azure-mgmt-datalake-nspkg': '3.0.1', 'azure-mgmt-datalake-store': '0.5.0', 'azure-mgmt-datamigration': '10.0.0', 'azure-mgmt-devtestlabs': '4.0.0', 'azure-mgmt-dns': '8.0.0', 'azure-mgmt-eventgrid': '10.2.0b2', 'azure-mgmt-eventhub': 
'10.1.0', 'azure-mgmt-extendedlocation': '1.0.0b2', 'azure-mgmt-hdinsight': '9.0.0', 'azure-mgmt-imagebuilder': '1.2.0', 'azure-mgmt-iotcentral': '10.0.0b2', 'azure-mgmt-iothub': '3.0.0', 'azure-mgmt-iothubprovisioningservices': '1.1.0', 'azure-mgmt-keyvault': '10.3.0', 
'azure-mgmt-kusto': '0.3.0', 'azure-mgmt-loganalytics': '13.0.0b4', 'azure-mgmt-managedservices': '1.0.0', 'azure-mgmt-managementgroups': '1.0.0', 'azure-mgmt-maps': '2.0.0', 'azure-mgmt-marketplaceordering': '1.1.0', 'azure-mgmt-media': '9.0.0', 'azure-mgmt-monitor': 
'5.0.1', 'azure-mgmt-msi': '7.0.0', 'azure-mgmt-netapp': '10.1.0', 'azure-mgmt-network': '25.2.0', 'azure-mgmt-nspkg': '3.0.2', 'azure-mgmt-policyinsights': '1.1.0b4', 'azure-mgmt-privatedns': '1.0.0', 'azure-mgmt-rdbms': '10.2.0b14', 'azure-mgmt-recoveryservices': '2.5.0',
'azure-mgmt-recoveryservicesbackup': '7.0.0', 'azure-mgmt-redhatopenshift': '1.4.0', 'azure-mgmt-redis': '14.2.0', 'azure-mgmt-resource': '23.0.1', 'azure-mgmt-search': '9.1.0', 'azure-mgmt-security': '5.0.0', 'azure-mgmt-servicebus': '8.2.0', 'azure-mgmt-servicefabric': 
'1.0.0', 'azure-mgmt-servicefabricmanagedclusters': '1.0.0', 'azure-mgmt-servicelinker': '1.2.0b1', 'azure-mgmt-signalr': '2.0.0b1', 'azure-mgmt-sql': '4.0.0b13', 'azure-mgmt-sqlvirtualmachine': '1.0.0b5', 'azure-mgmt-storage': '21.1.0', 'azure-mgmt-synapse': '2.1.0b5', 
'azure-mgmt-trafficmanager': '1.0.0', 'azure-mgmt-web': '7.2.0', 'azure-multiapi-storage': '1.2.0', 'azure-nspkg': '3.0.2', 'azure-storage-common': '1.4.2', 'azure-synapse-accesscontrol': '0.5.0', 'azure-synapse-artifacts': '0.18.0', 'azure-synapse-managedprivateendpoints':
'0.4.0', 'azure-synapse-spark': '0.2.0', 'bcrypt': '4.0.1', 'black': '23.12.1', 'blessed': '1.20.0', 'boto3': '1.34.32', 'botocore': '1.34.32', 'build': '1.0.3', 'cachecontrol': '0.13.1', 'cachetools': '5.3.2', 'certifi': '2023.11.17', 'cffi': '1.16.0', 'cfgv': '3.4.0', 
'chardet': '5.2.0', 'charset-normalizer': '3.3.2', 'cleo': '2.1.0', 'click': '8.1.3', 'click-params': '0.3.0', 'cloudpickle': '2.2.1', 'colorama': '0.4.4', 'colorful': '0.5.6', 'comm': '0.2.1', 'coverage': '7.4.1', 'crashtest': '0.4.1', 'cryptography': '42.0.2', 
'decorator': '5.1.1', 'distlib': '0.3.8', 'distro': '1.9.0', 'docker': '6.1.3', 'docker-pycreds': '0.4.0', 'docstring-parser': '0.15', 'docutils': '0.16', 'dulwich': '0.21.7', 'exceptiongroup': '1.2.0', 'executing': '2.0.1', 'fabric': '2.7.1', 'fastapi': '0.99.1', 
'fastapi-utils': '0.2.1', 'fastjsonschema': '2.19.1', 'filelock': '3.13.1', 'fine-tune-mistral': '0.1.0', 'fire': '0.5.0', 'frozenlist': '1.4.1', 'fsspec': '2023.12.2', 'gcsfs': '2023.12.2.post1', 'gitdb': '4.0.11', 'google-api-core': '2.16.1', 'google-api-python-client': 
'2.116.0', 'google-auth': '2.27.0', 'google-auth-httplib2': '0.2.0', 'google-auth-oauthlib': '1.2.0', 'google-cloud-aiplatform': '1.40.0', 'google-cloud-bigquery': '3.17.1', 'google-cloud-build': '3.22.0', 'google-cloud-container': '2.38.0', 'google-cloud-core': '2.4.1', 
'google-cloud-functions': '1.15.0', 'google-cloud-resource-manager': '1.11.0', 'google-cloud-scheduler': '2.12.0', 'google-cloud-secret-manager': '2.17.0', 'google-cloud-storage': '2.14.0', 'google-crc32c': '1.5.0', 'google-resumable-media': '2.7.0', 
'googleapis-common-protos': '1.62.0', 'gpustat': '1.1.1', 'greenlet': '3.0.3', 'grpc-google-iam-v1': '0.13.0', 'grpcio': '1.60.0', 'grpcio-status': '1.60.0', 'h11': '0.14.0', 'httplib2': '0.19.1', 'httptools': '0.6.1', 'humanfriendly': '10.0', 'identify': '2.5.33', 'idna': 
'3.6', 'importlib-metadata': '7.0.1', 'iniconfig': '2.0.0', 'installer': '0.7.0', 'invoke': '1.7.3', 'ipinfo': '5.0.1', 'ipython': '8.18.1', 'ipywidgets': '8.1.1', 'isodate': '0.6.1', 'isort': '5.13.2', 'jaraco.classes': '3.3.0', 'javaproperties': '0.5.2', 'jedi': '0.19.1',
'jeepney': '0.8.0', 'jmespath': '1.0.1', 'jsondiff': '2.0.0', 'jsonschema': '4.21.1', 'jsonschema-specifications': '2023.12.1', 'jupyterlab-widgets': '3.0.9', 'keyring': '24.3.0', 'kfp': '1.8.22', 'kfp-pipeline-spec': '0.1.16', 'kfp-server-api': '1.8.5', 'knack': '0.11.0', 
'kubernetes': '29.0.0', 'markdown-it-py': '3.0.0', 'matplotlib-inline': '0.1.6', 'mdurl': '0.1.2', 'more-itertools': '10.2.0', 'msal': '1.24.0b2', 'msal-extensions': '1.0.0', 'msgpack': '1.0.7', 'msrest': '0.7.1', 'msrestazure': '0.6.4', 'multidict': '6.0.4', 
'mypy-extensions': '1.0.0', 'networkx': '3.2.1', 'nodeenv': '1.8.0', 'numpy': '1.26.3', 'nvidia-ml-py': '12.535.133', 'oauthlib': '3.2.2', 'opencensus': '0.11.4', 'opencensus-context': '0.1.3', 'orjson': '3.8.14', 'packaging': '23.2', 'pandas': '2.2.0', 'paramiko': '3.4.0',
'parso': '0.8.3', 'passlib': '1.7.4', 'pathlib2': '2.3.7.post1', 'pathspec': '0.12.1', 'pendulum': '3.0.0', 'pexpect': '4.9.0', 'pip': '23.3.1', 'pkginfo': '1.9.6', 'platformdirs': '4.2.0', 'pluggy': '1.4.0', 'poetry': '1.7.1', 'poetry-core': '1.8.1', 
'poetry-plugin-export': '1.6.0', 'portalocker': '2.8.2', 'pre-commit': '3.6.0', 'prettytable': '3.9.0', 'prometheus-client': '0.19.0', 'prompt-toolkit': '3.0.43', 'proto-plus': '1.23.0', 'protobuf': '4.25.2', 'psutil': '5.9.8', 'ptyprocess': '0.7.0', 'pure-eval': '0.2.2', 
'pyopenssl': '24.0.0', 'py-spy': '0.3.14', 'pyasn1': '0.5.1', 'pyasn1-modules': '0.3.0', 'pycomposefile': '0.0.30', 'pycparser': '2.21', 'pydantic': '1.10.14', 'pygments': '2.17.2', 'pyparsing': '2.4.7', 'pyproject-hooks': '1.0.0', 'pytest': '7.4.4', 'python-dateutil': 
'2.8.2', 'python-dotenv': '1.0.1', 'python-multipart': '0.0.6', 'pytz': '2023.4', 'rapidfuzz': '3.6.1', 'ray': '2.3.1', 'referencing': '0.33.0', 'requests': '2.31.0', 'requests-oauthlib': '1.3.1', 'requests-toolbelt': '0.10.1', 'rich': '13.7.0', 'rpds-py': '0.17.1', 'rsa': 
'4.7.2', 'ruff': '0.1.15', 's3transfer': '0.10.0', 'scp': '0.13.6', 'semver': '2.13.0', 'sentry-sdk': '1.40.0', 'setproctitle': '1.3.3', 'setuptools': '69.0.3', 'shapely': '2.0.2', 'shellingham': '1.5.4', 'six': '1.16.0', 'skypilot': '0.4.1', 'smart-open': '6.4.0', 'smmap':
'5.0.1', 'sniffio': '1.3.0', 'sqlalchemy2-stubs': '0.0.2a38', 'sqlmodel': '0.0.8', 'sshtunnel': '0.1.5', 'stack-data': '0.6.3', 'starlette': '0.27.0', 'strip-hints': '0.1.10', 'tabulate': '0.9.0', 'termcolor': '2.4.0', 'time-machine': '2.13.0', 'tomli': '2.0.1', 'tomlkit': 
'0.12.3', 'traitlets': '5.14.1', 'trove-classifiers': '2024.1.8', 'typer': '0.9.0', 'typing-extensions': '4.9.0', 'tzdata': '2023.4', 'uritemplate': '4.1.1', 'urllib3': '1.26.18', 'uvicorn': '0.27.0.post1', 'uvloop': '0.19.0', 'validators': '0.18.2', 'virtualenv': 
'20.25.0', 'wandb': '0.16.2', 'watchfiles': '0.21.0', 'wcwidth': '0.2.13', 'websocket-client': '1.7.0', 'websockets': '12.0', 'wheel': '0.42.0', 'widgetsnbextension': '4.0.9', 'wrapt': '1.16.0', 'xmltodict': '0.13.0', 'yarl': '1.9.4', 'zenml': '0.54.1', 'zipp': '3.17.0'}

CURRENT STACK

Name: gcp
ID: b649c3ff-0e33-477e-ad3c-1328e1811ff5
User: default / 061bdf54-9b2d-471c-aac3-f18f185bb930
Workspace: default / e7bea687-41cd-4893-9889-12f4974bba07

ORCHESTRATOR: skypilot_gcp

Name: skypilot_gcp
ID: 6ee52de6-6149-4c9a-a404-de681290e1f6
Type: orchestrator
Flavor: vm_gcp
Configuration: {'instance_type': None, 'cpus': None, 'memory': None, 'accelerators': None, 'accelerator_args': None, 'use_spot': None, 'spot_recovery': None, 'region': None, 'zone': None, 'image_id': None, 'disk_size': None, 'disk_tier': None, 'cluster_name': None, 
'retry_until_up': False, 'idle_minutes_to_autostop': 30, 'down': True, 'stream_logs': True, 'docker_run_args': [], 'project': None, 'service_account_path': None, 'disable_step_based_settings': False}
User: default / 061bdf54-9b2d-471c-aac3-f18f185bb930
Workspace: default / e7bea687-41cd-4893-9889-12f4974bba07

ARTIFACT_STORE: gcs_store

Name: gcs_store
ID: 7336829e-181d-44f2-ad26-94269fc776f3
Type: artifact_store
Flavor: gcp
Configuration: {'authentication_secret': None, 'path': 'gs://zenml-gcs-artifact-store/fine-tune-mistral'}
User: default / 061bdf54-9b2d-471c-aac3-f18f185bb930
Workspace: default / e7bea687-41cd-4893-9889-12f4974bba07

CONTAINER_REGISTRY: gcp

Name: gcp
ID: 09d8b2a7-b7fd-4414-8c48-2e6df2f9870a
Type: container_registry
Flavor: gcp
Configuration: {'authentication_secret': None, 'uri': 'gcr.io/firm-moonlight-411008'}
User: default / 061bdf54-9b2d-471c-aac3-f18f185bb930
Workspace: default / e7bea687-41cd-4893-9889-12f4974bba07

What happened?

When running the pipeline using the skypilot orchestrator on GCP, it idles after some time if the authentication with the service account is already done. I could pinpoint it to this line. If I set a breakpoint here and run the authentication with the CLI I get the follwing prompt:

You are already authenticated with 'private-service-account@firm-moonlight-411008.iam.gserviceaccount.com'. Do you wish to proceed and overwrite existing credentials? Do you want to continue (Y/n)?

Now it's idling as it expects an input. If I revoke the authtentication before running the pipeline this works.

I can file a PR checking if the authentication is needed before execcuting this line if you're interested.

As an additional informationn about the system: gcloud --version prints the following:

Google Cloud SDK 459.0.0
alpha 2024.01.06
beta 2024.01.06
bq 2.0.101
bundled-python3-unix 3.11.6
core 2024.01.06
gcloud-crc32c 1.0.0
gsutil 5.27
kubectl 1.27.9

Reproduction steps

No response

Relevant log output

No response

Code of Conduct

stefannica commented 7 months ago

Thanks for reporting this @mcschmitz ! I wasn't aware that gcloud requires a confirmation on re-authentication.

Adding --quiet to the invoked gcloud CLI command took care of this problem.