kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.51k stars 660 forks source link

ARM64 supported in PyTorch examples #2116

Closed danielsuh05 closed 1 month ago

danielsuh05 commented 1 month ago

What this PR does / why we need it: Supports ARM64 for users to try training-example

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged): Fixes #2111

danielsuh05 commented 1 month ago

Sorry, I don't know why I had so many issues with rebasing but this one should work.

tenzen-y commented 1 month ago

Sorry, I don't know why I had so many issues with rebasing but this one should work.

No worries, thank you for this contribution!

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 9122150928

Details


Files with Coverage Reduction New Missed Lines %
pkg/controller.v1/mpi/mpijob_controller.go 6 80.67%
<!-- Total: 6 -->
Totals Coverage Status
Change from base Build 9033784995: -0.03%
Covered Lines: 4374
Relevant Lines: 12362

💛 - Coveralls
google-oss-prow[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/training-operator/blob/master/OWNERS)~~ [tenzen-y] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment