kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.53k stars 664 forks source link

Tensorflow example image has a lot of warnings and python 2.7. #1884

Open kannon92 opened 11 months ago

kannon92 commented 11 months ago

I was using the tensorflow example image to demonstrate JobSet functionality and I saw a lot of warnings related to deprecated features. Not sure if this image will eventually stop working but wanted to bring it to your attention.

[ec2-user@ip-172-31-93-184 jobset-kevin]$ kubectl logs tensorflow-tensorflow-0-0-5l6bb
WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
2023-08-10 13:42:32.288488: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

Ref: https://github.com/kubernetes-sigs/jobset/pull/253

tenzen-y commented 11 months ago

I think we should use docker.io/kubeflowkatib/tf-mnist-with-summaries:latest in the TFJob example since the image is so old.

kannon92 commented 11 months ago

I can open up a PR to fix that.

tenzen-y commented 11 months ago

@johnugeorge What do you think removing kubeflow/tf-mnist-with-summaries:latest then using ocker.io/kubeflowkatib/tf-mnist-with-summaries:latest?

johnugeorge commented 11 months ago

We are building kubeflow/tf-mnist-with-summaries:latest image built each time.

https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/.github/workflows/publish-example-images.yaml#L31-L32

So if update Dockerfile here to use latest tensor flow, won't it be solved automatically?

https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/examples/tensorflow/mnist_with_summaries/Dockerfile#L15

tenzen-y commented 11 months ago

We are building kubeflow/tf-mnist-with-summaries:latest image built each time.

https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/.github/workflows/publish-example-images.yaml#L31-L32

So if update Dockerfile here to use latest tensor flow, won't it be solved automatically?

https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/examples/tensorflow/mnist_with_summaries/Dockerfile#L15

I'm not sure if the image works well by just bumping the base image version because we must resolve issues for Python major version bumping (Python 2.7 -> Python 3.x) and Tensorflow major version bumping (TF 1.11 -> TF 2.x).

johnugeorge commented 11 months ago

Newer tensorflow images are not using python 2.x. I just checked tensorflow/tensorflow:2.13.0 and it uses Python 3.8.10

tenzen-y commented 11 months ago

Newer tensorflow images are not using python 2.x. I just checked tensorflow/tensorflow:2.13.0 and it uses Python 3.8.10

I see. I'm ok with bumping TF version if it works without error.

johnugeorge commented 9 months ago

/good-first-issue

google-oss-prow[bot] commented 9 months ago

@johnugeorge: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubeflow/training-operator/issues/1884): >/good-first-issue Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tenzen-y commented 6 months ago

/lifecycle frozen