Closed PeterChg closed 2 months ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Totals | |
---|---|
Change from base Build 10131041132: | 0.07% |
Covered Lines: | 3946 |
Relevant Lines: | 11301 |
Why does the integration test fail, seemingly unrelated to the code change. How do I re-launch the test /cc gaocegege
/rerun-all
…us in ApiServer failed
What this PR does / why we need it:
The following problems may occur when call UpdateJobStatusInApiServer function, This causes large number of repeated retries JobReconciles:
ERROR Reconciler error {"controller": "pytorchjob-controller", "object": {"name":"test-0717","namespace":"ns-test"}, "namespace": "ns-test", "name": "tj-test-fusion-0717", "reconcileID": "2fe0485f-1a89-46d2-bf50-81eeadbd979f", "error": "PyTorchJob.kubeflow.org \"tj-test-fusion-0717\" is invalid: status.replicaStatuses: Required value"}
This problem is resolved after the initialization mode is changed.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged): Fixes #Checklist: