Closed alanhdu closed 3 months ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Feature Area
</area sdk> </area components>
What feature would you like to see?
Right now, the Python components can only input + output a fairly limited set of types (essentially JSON, with some special case support for "file-like" objects / paths). It'd be great if we could plug-in support for a richer set of types where the serialization/deserialization happens outside of the Python component.
What is the use case or pain point?
We are currently trying to use Kubeflow pipelines as part of our research workflow. As part of this, we have scientists (who are not well-versed in Argo or Kubernetes) writing Python function-based components. A lot of the research code, however, requires "complicated objects" -- whether those are NumPy arrays, Pandas Dataframes, or even just sophisticated
dataclass
es for "structured" configuration. These are not natively supported by Kubeflow pipelines, but all of them have straightfoward serialization/deserialization methods we can provide to and from strings (e.g. pandas dataframes can be represented by an S3 URI pointing to CSV that we serialize/deserialize from). Right now, scientists have to provide that boilerplate themselves, which takes a lot of time.Concrete, it'd be nice if turn something like:
Is there a workaround currently?
I haven't implemented these yet, but I could see a couple of implementation options:
Hook into the converters at https://github.com/kubeflow/pipelines/blob/a6ab4e4411dcb700e751c458f47d69857a65ee7a/sdk/python/kfp/components/_data_passing.py#L109-L125 to add special serialization + deserialization hooks for the relevant classes.
Implement a special
Json
generic type and a@to_json
decorator that basically does:Json[T]
would implement theto_dict
function to register itself as just JSON from kubeflow's POV.to_json
would handle serializing / deserializing theJson[T]
to and fromT
based on the type annotation.