kubernetes-sigs / jobset

JobSet: a k8s native API for distributed ML training and HPC workloads
https://jobset.sigs.k8s.io/
Apache License 2.0
138 stars 45 forks source link

Automated cherry pick of #580: relax validation on replicated jobs #584

Closed danielvegamyhre closed 4 months ago

danielvegamyhre commented 4 months ago

Cherry pick of #580 on release-0.5.

580: relax validation on replicated jobs

For details on the cherry pick process, see the cherry pick requests page.

danielvegamyhre commented 4 months ago

@omerap12 it seems the release branch presubmit is misconfigured, it fails to start with error could not start the process: exec: "runner.sh": executable file not found in $PATH. Can you please take a look?

kannon92 commented 4 months ago

Lgtm to me but the test failures seem concerning

omerap12 commented 4 months ago

@omerap12 it seems the release branch presubmit is misconfigured, it fails to start with error could not start the process: exec: "runner.sh": executable file not found in $PATH. Can you please take a look?

Yeah sure I'll take a look

alculquicondor commented 4 months ago

It looks like you forgot to change the image for the presubmit https://github.com/kubernetes/test-infra/blob/3c1c806d19d680b83e4f0a128f1d38641607d2ea/config/jobs/kubernetes-sigs/jobset/jobset-presubmit-release-0.5.yaml#L50

omerap12 commented 4 months ago

Hey @danielvegamyhre @alculquicondor , I changed the command to much the golang image. Here: https://github.com/kubernetes/test-infra/pull/32680

danielvegamyhre commented 4 months ago

/retest

alculquicondor commented 4 months ago

The builder for 1.27 is still on go1.21

https://github.com/kubernetes/test-infra/blob/febc82a3ff9c4f63ea1406f4ee1097bb3c1316b6/config/jobs/kubernetes-sigs/jobset/jobset-presubmit-release-0.5.yaml#L82

omerap12 commented 4 months ago

The builder for 1.27 is still on go1.21

https://github.com/kubernetes/test-infra/blob/febc82a3ff9c4f63ea1406f4ee1097bb3c1316b6/config/jobs/kubernetes-sigs/jobset/jobset-presubmit-release-0.5.yaml#L82

Fixing .

omerap12 commented 4 months ago

@danielvegamyhre @alculquicondor https://github.com/kubernetes/test-infra/pull/32682

danielvegamyhre commented 4 months ago

/retest

kannon92 commented 4 months ago

/lgtm

danielvegamyhre commented 4 months ago

/approve

k8s-ci-robot commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielvegamyhre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/jobset/blob/release-0.5/OWNERS)~~ [danielvegamyhre] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment