tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.org/tfx
Apache License 2.0
2.11k stars 707 forks source link

Issue when using exit_handler with Kubeflow orchestrator #5528

Closed jccarles closed 1 year ago

jccarles commented 1 year ago

Environment:

Python 3.7.7 tfx==1.8.1 kfp==1.8.13 kfp-pipeline-spec==0.1.16 kfp-server-api==1.8.1

Issue description:

I have a tfx pipeline orchestrated by kubeflow pipeline. I upgraded tfx to version 1.8.1 in order to try to use the exit_handler support which was introduced in tfx 1.8.0. I manage to build and compile my tfx pipeline in a kubeflow pipeline, but it looks like when we use the exit_handler, tfx orchestrator has a special behavior as we can see in tfx/orchestration/kubeflow/utils.py.

# Key of dag for all TFX components when compiling pipeline with exit handler.
TFX_DAG_NAME = '_tfx_dag'

It seems like this result in an invalid argo workflow generated as when I try to use kfp API to create a run from the pipeline compiled file I get the following error from kubeflow API

Failed to create a new run.
(...)
"error_details":"Internal error: spec.templates[0].name: '_tfx_dag' is invalid: name must consist of alpha-numeric characters or '-', and must start with an alpha-numeric character (e.g. My-name1-2, 123-NAME)
(...)

Any idea if I am doing something wrong with how I use exit_handler ?

Do you think it is safe to update TFX_DAG_NAME to an argo complient name ?

Thank you for your time.

singhniraj08 commented 1 year ago

@jccarles,

As mentioned in official documentation, Exit handler is to annotate the component for post actions of a pipeline, only supported in Vertex AI.

exit_handler is currently supported in Vertex AI only so it won't work with kubeflow pipelines. Hope this answers your query. Thank you!

jccarles commented 1 year ago

Thank you for your response !

Sorry my issue was not clear I am speaking about the set_exit_handler method of the KubeflowDagRunner: https://www.tensorflow.org/tfx/api_docs/python/tfx/v1/orchestration/experimental/KubeflowDagRunner#set_exit_handler not the decorator.

I built my own custom component which I pass to the set_exit_handler method of the KubeflowDagRunner instance, I am not using the exit_handler decorator.

singhniraj08 commented 1 year ago

@jccarles,

Can you please try changing the TFX_DAG_NAME variable to argo compliant name as mentioned in the error and see if the pipeline works. Thank you!

jccarles commented 1 year ago

@singhniraj08

Yes I renamed it from _tfx_dag to tfxdag and everything is looking to be working fine ! For now it doesn't look like it broke anything else 😅

singhniraj08 commented 1 year ago

@jccarles,

Thank you for the confirmation. Requesting to close this issue since it's resolved. Thank you!

Enzo90910 commented 1 year ago

Seems to me this issue is not with the tfx.v1.orchestration.experimental.exit_handler decorator (which as per the documentation only supports Vertex AI), but with the method set_exit_handler of KubeflowDagRunner, which uses a constant defined in tfx/orchestration/kubeflow/utils.py, both of which strongly suggest they should work with Kubeflow.

The logical solution would be to update the TFX_DAG_NAME constant to not contain underscores, either to "tfxdag" or to "tfx-dag".

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 14 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No