kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.62k stars 701 forks source link

Update tf job examples to tf v2 #2270

Closed YosiElias closed 3 weeks ago

YosiElias commented 2 months ago

What this PR does / why we need it: Update TFjob examples to TF v2

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged): Fixes #2015

Checklist:

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 11662420881

Details


Totals Coverage Status
Change from base Build 11645970070: 0.0%
Covered Lines: 77
Relevant Lines: 77

💛 - Coveralls
YosiElias commented 2 months ago

Hi @alongzhi, I saw you added ppc64le images before. In order to support ppc64le I need base image of TF-v2 to replace this image - ibmcom/tensorflow-ppc64le:1.13.1 that used in: examples/tensorflow/dist-mnist/Dockerfile.ppc64le and examples/tensorflow/mnist_with_summaries/Dockerfile.ppc64le.

Can you pls support this image? otherwise just let me know and I'll remove this Dockerfile.ppc64le since the source code support only TF-v2 now..

@JosepSampe @ckadner Do you know who can support this image from IBM?

@andreyvelich - FYI

andreyvelich commented 1 month ago

@alongzhi Are you planning to continue support powerPC images in Training Operator ? Otherwise, we will remove them. cc @kubeflow/wg-training-leads

YosiElias commented 1 month ago

/assign @kuizhiqing

andreyvelich commented 1 month ago

@YosiElias Please can you rebase your PR to fix unit tests ?

andreyvelich commented 1 month ago

/rerun-all

google-oss-prow[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/training-operator/blob/master/OWNERS)~~ [tenzen-y] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment