kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

Added Pytorch Cuda Docker Image as the Image pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime in not having cuda so cannot used GPU #248

Open MATRIX4284 opened 4 years ago

MATRIX4284 commented 4 years ago

Added Pytorch Cuda Docker Image as the Image pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime in not having cuda.So the examples/mnist.py is not using GPU.The issue is with the pytorch image .The new docker image i supplied is having the cuda dlevel and runtime environment which i tested and working like a breeze on GPU. The priginal mnist.py which was taking 10 -12 minutes on my double xeon 2670 is taking roughly 1 minute toi get completed using my Titan XP pascal series GPU.

This is the fix to the issue number #245

k8s-ci-robot commented 4 years ago

Hi @MATRIX4284. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign richardsliu You can assign the PR to them by writing /assign @richardsliu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubeflow/pytorch-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
coveralls commented 4 years ago

Coverage Status

Coverage remained the same at 22.97% when pulling c42551774d8c575f883581c5daf4686a9322b0fc on MATRIX4284:master into 94197a2dc5be7f59b50add137892cc0a21fb599a on kubeflow:master.

johnugeorge commented 4 years ago

Can you move the docker file to https://github.com/kubeflow/pytorch-operator/tree/master/examples/mnist and rename it appropriately with mnist changes?

MATRIX4284 commented 4 years ago

Can you move the docker file to https://github.com/kubeflow/pytorch-operator/tree/master/examples/mnist and rename it appropriately with mnist changes? Ideally it should be under examples not under mnist as this is a general pytorch gpu docker which will be used by all application not specific to mnist. It will be better if we keep it in a separate folder named pytorch docker .

johnugeorge commented 4 years ago

@MATRIX4284 Thanks for your contribution. I got your point. However, I feel that it is better not to keep it in the root folder as it is not related to pytorch operator. Hence I felt, keeping it in examples looks more appropriate. And users who want to try gpu version, can refer this example(even if it is a different use case)

MATRIX4284 commented 4 years ago

I will move it under example folder in a folder named pytorch-gpu.Thanks for the guidance.

On Tue, 7 Jan 2020 at 12:15 PM, Johnu George notifications@github.com wrote:

@MATRIX4284 https://github.com/MATRIX4284 Thanks for your contribution. I got your point. However, I feel that it is better not to keep it in the root folder as it is not related to pytorch operator. Hence I felt, keeping it in examples looks more appropriate. And users who want to try gpu version, can refer this example(even if it is a different use case)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/pytorch-operator/pull/248?email_source=notifications&email_token=AE7YQAFGRP4DRJV4EK4WLBLQ4QQHFA5CNFSM4KCO56DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIH4JVY#issuecomment-571458775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7YQAE2NQWGHPWCPJRINQTQ4QQHFANCNFSM4KCO56DA .

MATRIX4284 commented 4 years ago

I will move it under example folder in a folder named pytorch-gpu.Thanks for the guidance. On Tue, 7 Jan 2020 at 12:15 PM, Johnu George @.***> wrote: @MATRIX4284 https://github.com/MATRIX4284 Thanks for your contribution. I got your point. However, I feel that it is better not to keep it in the root folder as it is not related to pytorch operator. Hence I felt, keeping it in examples looks more appropriate. And users who want to try gpu version, can refer this example(even if it is a different use case) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#248?email_source=notifications&email_token=AE7YQAFGRP4DRJV4EK4WLBLQ4QQHFA5CNFSM4KCO56DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIH4JVY#issuecomment-571458775>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7YQAE2NQWGHPWCPJRINQTQ4QQHFANCNFSM4KCO56DA .

Opened the pr #255 with the docker under the examples folder