Closed sandipanpanda closed 1 month ago
/cc @andreyvelich @tenzen-y @terrytangyuan
Totals | |
---|---|
Change from base Build 10937611143: | 0.0% |
Covered Lines: | 66 |
Relevant Lines: | 66 |
@sandipanpanda - Is this PR ready for review?
@sandipanpanda - Is this PR ready for review?
Actually, no. @sandipanpanda continues trying some implementations. Regarding the remaining implementations, you can find the weekly sync documentation.
Adding jaxjob webhook test, examples and finishing up some work on test_e2e_jaxjob.py remain. Can you please share your input if the current implementation up until now is in the correct direction?
@sandipanpanda If this PR is ready for review, please review the WIP from the PR title.
@sandipanpanda Additionally, could you add Dockerfile to build pipeline? https://github.com/kubeflow/training-operator/blob/6ddeb2b90ebe116beaa800c57c344913e78aaf38/.github/workflows/publish-example-images.yaml
I guess that the current integration testing error could be resolved by updating this file: https://github.com/kubeflow/training-operator/blob/6ddeb2b90ebe116beaa800c57c344913e78aaf38/manifests/base/webhook/patch.yaml
@sandipanpanda Could you address this error? We need to pass the appropriate arguments to e2e testing.
DEBUG kubernetes.client.rest:rest.py:235 response body: FATAL Flags parsing error: flag --job_name=None: Flag --job_name must have a value other than None. flag --sub_domain=None: Flag --sub_domain must have a value other than None. flag --coordinator_port=None: Flag --coordinator_port must have a value other than None. Pass --helpshort or --helpfull to see help on flags.
DEBUG kubernetes.client.rest:rest.py:235 response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Success","details":{"name":"jaxjob-cpu-ci-test","group":"kubeflow.org","kind":"jaxjobs","uid":"f7402a7c-588a-4b9d-903d-969ba0d4c7e2"}}
cc @tenzen-y PTAL
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: tenzen-y
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Thank you for your unwavering guidance and support throughout this project!
What this PR does / why we need it: Implement JAX controller
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged): Fix: https://github.com/kubeflow/training-operator/issues/1619 Ref:https://github.com/kubeflow/training-operator/issues/2145/area gsoc