tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 709 forks source link

Need to run TFX on kubeflow on Azure Cloud #1144

Closed jackhawa closed 4 years ago

jackhawa commented 4 years ago

Currently the tfx kubeflow dag runner example only works in GCP (since gcp api is used). Is it possible to run tfx on kubeflow on Azure cloud? If not, is it part of the plan and when?

numerology commented 4 years ago

Currently the example with minimal GCP dependency is the local example, which only requires Kubernetes (@ucdmkt can correct me if I'm wrong)

That being said, in order to run that on AWS, some efforts and tweak might be necessary. First, you need to deploy Kubeflow Pipelines on AWS following this guide, or try to follow the standalone deployment guide here.

jackhawa commented 4 years ago

Thanks for the reply. I am only interested in Azure for the moment, not AWS. So if I understand correctly, there is no way to publish a TFX pipeline on Azure. Can you confirm this?

If so, when do you think it will be possible?

numerology commented 4 years ago

Thanks for the reply. I am only interested in Azure for the moment, not AWS. So if I understand correctly, there is no way to publish a TFX pipeline on Azure. Can you confirm this?

If so, when do you think it will be possible?

Sorry, actually I misread your question. Please see here for deployment on Azure.

jackhawa commented 4 years ago

Thanks @numerology . The link you have posted is only about working with generic pipelines on kubeflow on Azure and not TFX specifically. In my understanding it needs a special TFX DSL in order for that to work.

Have you guys had the chance to build a TFX DSL that works with Azure?

numerology commented 4 years ago

In my understanding it needs a special TFX DSL in order for that to work.

The current workflow is TFX DSL ----KubeflowDagRunner---> Kubeflow pipelines workflow spec Then one can submit the Kubeflow pipelines workflow spec to a KFP deployment on GCP, AWS or Azure. The TFX pipeline/components DSL won't change.

Have you guys had the chance to build a TFX DSL that works with Azure?

Indeed, there might be some necessary changes in KubeflowDagRunnerConfig, especially regarding the way how pipeline talks to MLMD. Currently we don't have a 'ready-to-go' config that makes it work on Azure in this repo, but that should not be very hard IMHO.

mattmbk commented 4 years ago

@numerology has this been resolved e.g. via https://github.com/tensorflow/tfx/commit/d396e9fea768956137e339bb43ef1edb8e73127c ?

Thanks!

numerology commented 4 years ago

Hi @dushyanthsc , does the default gRPC config compatible with the current Kubeflow deployment? Thanks

ucdmkt commented 4 years ago

The bottom line of local pipeline example is that so long as there is MLMD API server is running as a part of Kubeflow Pipelines deployment, and its config map to the server exists, it should work. If such configuration map isn't there, you would need to override it here

rmothukuru commented 4 years ago

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!