kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
420 stars 211 forks source link

Use local copy of JobStatus by mpi-operator #514

Closed tenzen-y closed 1 year ago

tenzen-y commented 1 year ago

Signed-off-by: Yuki Iwai yuki.iwai.tz@gmail.com

I copied the JobStatus api to this repository.

Currently, we can not use auto-generated CRDs in our unit and E2E tests since the common.JobStatus requires JobStatus.Condition and JobStatus.ReplicaStatuses by default in the following:

type JobStatus struct {
    // Conditions is an array of current observed job conditions.
    Conditions []JobCondition `json:"conditions"`

    // ReplicaStatuses is map of ReplicaType and ReplicaStatus,
    // specifies the status of each replica.
    ReplicaStatuses map[ReplicaType]*ReplicaStatus `json:"replicaStatuses"`
...

https://github.com/kubeflow/common/blob/9ec55d141f90faaf52fd6df271e987e5a6781945/pkg/apis/common/v1/types.go#L25-L31

So, if we use auto-generated CRDs, we face the following errors in tests:

--- FAIL: TestMPIJobSuccess (2.13s) mpi_job_controller_test.go:49: Using namespace test-gbp22 mpi_job_controller_test.go:94: Failed sending job to apiserver: MPIJob.kubeflow.org "job" is invalid: [status.conditions: Required value, status.replicaStatuses: Required value]

--- FAIL: TestMPIJobFailure (0.01s) mpi_job_controller_test.go:175: Using namespace test-9fpm8 mpi_job_controller_test.go:221: Failed sending job to apiserver: MPIJob.kubeflow.org "job" is invalid: [status.conditions: Required value, status.replicaStatuses: Required value]

https://github.com/kubeflow/mpi-operator/actions/runs/4071386113/jobs/7013139204#step:8:39

Blocking: #510

alculquicondor commented 1 year ago

What is missing in this PR?

tenzen-y commented 1 year ago

What is missing in this PR?

@alculquicondor There are no missing. PTAL.

alculquicondor commented 1 year ago

Let's see which one merges first :joy: #511

/assign @terrytangyuan

tenzen-y commented 1 year ago

Let's see which one merges first 😂 #511

/assign @terrytangyuan

Haha. I trust @terrytangyuan :)

google-oss-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mpi-operator/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment