Open andreyvelich opened 1 year ago
+1 for ease of use. Although I would avoid mentioning "docker" which is implementation specific.
Makes sense, any suggestions @terrytangyuan (e.g. create_job_from_image
) ?
What about create_job(func, img)
that calls underlying implementation?
Makes sense, so just provide users 1 API called create_job
where they can set Custom Resource, function or image and we are going to process the request accordingly, right ?
Yep exactly this will avoid exploding the list of public APIs.
It's a good idea. SGTM
In the future, we can introduce target_image, packages_to_install, etc. parameters which allows SDK to build Docker image on a fly using Docker client. User requires to run docker daemon to use it.
In future work, it might be better to add parameters to define if push built image to the registry.
/cc @gaocegege
/assign @andreyvelich
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
Previously, we created
create_job_from_func
API: https://github.com/kubeflow/training-operator/pull/1659. This API is useful for users who want to quickly convert their training function to a Kubeflow Distributed Training Job, but it is hard to be used for large models since all import/code should be self-contained.Similar to KFP Containerized Python Components, we can introduce a new API called:
create_job_from_docker
which helps user converts their training code to a Kubeflow Training Job.Initially, we can have the following signature:
Which is simply constructing Training Job using base image.
In the future, we can introduce
target_image
,packages_to_install
, etc. parameters which allows SDK to build Docker image on a fly using Docker client. User requires to run docker daemon to use it.Related: https://github.com/kubeflow/common/issues/66.
What do you think @kubeflow/wg-training-leads @tenzen-y @kuizhiqing @yaobaiwei @zw0610 @droctothorpe ?