Closed drubinstein closed 4 months ago
Hi @drubinstein, sorry for the late reply.
For complex object such as numpy array, you can pass them by file. Here're some docs about passing by file: https://www.kubeflow.org/docs/components/pipelines/sdk/python-function-components/#passing-parameters-by-file or the v2 way if you're using Vertex Pipelines https://www.kubeflow.org/docs/components/pipelines/sdk-v2/v2-component-io/
Generally speaking pickling isn't a great idea. Technically it works for small object like dataframe, however you would be passing some unreadable string between components. If you have a large numpy array, the pickle data could be too large to be passed as string value, because components are eventually containerized apps, string values are passed into a container using command line, so it subjects to the limit of command line characters.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing this issue. No activity for more than a year.
/close
@rimolive: Closing this issue.
Feature Area
/area backend /area sdk /area samples
What feature would you like to see?
May already be handled, but I notice that in _data_passing.py there's a
Base64Pickle
Converter. Would it be possible to provide some example code in the documentation that shows how to say return an object that will get pickled and then unpickled when used in a following step of a pipeline? Alternatively (and preferred), would it be possible to automatically detect complex objects and automatically pickle/unpickle them (though I think you could instead have everything that gets passed between python components go through pickle instead)?What is the use case or pain point?
From what I've read and tried, KFP currently does not support more complex objects such as a numpy array or a datetime as a component arguments, parameters, return values etc. The current best way to handle that would be to convert them to a string and then parse it at the beginning of the next component step. If objects were pickled between components and then unpickled before being the python function component was called, then I could have more accurate types and reduce boilerplate code.
Is there a workaround currently?
Currently, I pass strings around and parse them at the beginning of my components. For example instead of having
I'll do something closer to:
at the beginning of all my functions.
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.