kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.61k stars 698 forks source link

Inconsistent implementation in UpdateJobStatusInApiServer method #1710

Open HeGaoYuan opened 1 year ago

HeGaoYuan commented 1 year ago

In the following codes, they are UpdateJobStatusInApiServer method implementation in different Job. They are very similar but different. It is very confusing when programmers read the codes.

  1. Does trainingoperatorcommon.ClearGeneratedFields() method need?
  2. Does reflect.DeepEqual() method need?
  3. What is the difference between reflect.DeepEqual() and equality.Semantic.DeepEqual()?
  4. Does jc.Log.WithValues() need? It seems did nothing.

Referring to point4 of https://github.com/kubeflow/training-operator/issues/1703

https://github.com/kubeflow/training-operator/blob/e13d262fe4fabd548ded944e5356c99f30a33b00/pkg/controller.v1/mpi/mpijob_controller.go#L646-L678

https://github.com/kubeflow/training-operator/blob/e13d262fe4fabd548ded944e5356c99f30a33b00/pkg/controller.v1/mxnet/mxjob_controller.go#L430-L451

https://github.com/kubeflow/training-operator/blob/e13d262fe4fabd548ded944e5356c99f30a33b00/pkg/controller.v1/pytorch/pytorchjob_controller.go#L483-L511

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

andreyvelich commented 1 year ago

cc @johnugeorge @tenzen-y @kuizhiqing

tenzen-y commented 1 year ago

/remove-lifecycle stale

Yes, we should refactor the UpdateJobStatusInApiServer.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

andreyvelich commented 11 months ago

/lifecycle frozen