[BUG]: Orchestrator is unable to access ML Pipelines API

RoyerRamirez commented 2 years ago

Contact Details [Optional]

System Information

ZenML version: 0.9.0 Install path: /home/jovyan/factory-data-algorithms/projects/common/process-health/.venv/lib/python3.9/site-packages/zenml Python version: 3.9.9 Platform information: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '20.04'} Environment: docker Integrations: ['kubeflow', 'scipy', 'seldon']

What happened?

Hi, we are running zenml from a Jupyter Notebook in the same cluster as ML Pipelines. Currently the Orchestrator is only able to access to ML Pipelines API by using kube-proxy, or port-forwarding.

It does not make sense to do port-forwarding or go through kube-proxy when ZenML is running in the same cluster as ML Pipelines. We should be able to hit the endpoint directly. We can directly access ML Pipelines and any other Kubeflow service using the following dns record ..svc.cluster.local:. This is demonstrated below:

curl -XGET ml-pipeline.kubeflow.svc.cluster.local:8888/apis/v1beta1/healthz {"commit_sha":"c8a18bde299f2fdf5f72144f15887915b8d11520", "tag_name":"1.8.1", "multi_user":true}(base)

Furthermore, ML-Pipelines also runs a step to determine if it's running as a local deployment or on an external server by looking for a kube config file. If it can't find one, it assumes they "own" the ML Pipeline deployment and they don't perform kube-proxy/port-forwarding to access other services.

Reproduction steps

Create an orchestrator for Kubeflow Pipelines
Deploy a Jupyter Notebook or Deployment in the same cluster as ML Pipelines
Attempt to create a pipeline. You will receive a connection error and/or a kube-context error. ...

Relevant log output

Example of how ML-Pipeline will determine if they "own" their deployment. Maybe ZenML can do something similar to resolve this issue? I recommend looking for a kube config file. If one does not exist, then assume it's on the cloud or it can access the ML Pipeline endpoint directly.

ML Pipeline Startup Logs:
I0623 23:40:09.590195       7 client_manager.go:160] Initializing client manager
I0623 23:40:09.590498       7 config.go:57] Config DBConfig.ExtraParams not specified, skipping
I0623 23:40:13.018429       7 client_manager.go:415] We already own mlpipeline
I0623 23:40:13.019436       7 swf.go:64] (Expected when in cluster) Failed to create scheduled workflow client by out of cluster kubeconfig. Error: stat /root/.kube/config: no such file or directory
I0623 23:40:13.019462       7 swf.go:66] Starting to create scheduled workflow client by in cluster config.
I0623 23:40:13.026601       7 client_manager.go:204] Client manager initialized successfully
I0623 23:40:13.030082       7 main.go:183] Samples already loaded in the past. Skip loading.
I0623 23:40:13.036776       7 resource_manager.go:1021] Default experiment already exists! ID: ba6bb05d-565d-40fa-b589-6af95995cc34
I0623 23:40:13.036830       7 main.go:120] Starting Http Proxy
I0623 23:40:13.040199       7 main.go:90] Starting RPC server

Code of Conduct

[X] I agree to follow this project's Code of Conduct

stefannica commented 2 years ago

@RoyerRamirez thank you for opening this issue ! The Kubeflow orchestrator does provide some advanced configuration attributes that you can tweak that might just allow running pipelines from within the same cluster as Kubeflow, although we never tried this use-case:

kubeflow_hostname: The hostname to use to talk to the Kubeflow Pipelines API. If not set, the hostname will be derived from the Kubernetes API proxy.
skip_cluster_provisioning: If True, the k3d cluster provisioning will be skipped.
skip_ui_daemon_provisioning: If True, provisioning the KFP UI daemon will be skipped (i.e. port-forwarding will not be done nor required for running pipelines).

RoyerRamirez commented 2 years ago

Hi @stefannica, thank you. I'm going to close this bug.

stefannica commented 2 years ago

@RoyerRamirez just for the sake of completeness, in case we run into this situation again in the future: were you able to implement your use-case with those extra orchestrator parameters ?

zenml-io / zenml