Closed TraceCS closed 2 years ago
Hi @TraceCS My numbers look like these
iters=20 bsz=16 avg time compare:83546.40 -- (dynamo)91269.50 ratio:0.9154
iters=50 bsz=16 avg time compare:65139.68 -- (dynamo)61867.60 ratio:1.0529
iters=100 bsz=16 avg time compare:64421.50 -- (dynamo)54427.80 ratio:1.1836
iters=200 bsz=16 avg time compare:64340.89 -- (dynamo)50062.75 ratio:1.2852
I am not sure about what the docker does. This is also on A100 GPUs.
Thanks for the followup and reply, this seems the print-out of the code I put, so maybe the difference is because my gpu is TITAN Xp?
---Original--- From: "Animesh @.> Date: Fri, Aug 12, 2022 07:58 AM To: @.>; Cc: @.**@.>; Subject: Re: [pytorch/torchdynamo] Problems of using the acceleration oftorchdynamo for Resnet when training (Issue #694)
Hi @TraceCS My numbers look like these
iters=20 bsz=16 avg time compare:83546.40 -- (dynamo)91269.50 ratio:0.9154 iters=50 bsz=16 avg time compare:65139.68 -- (dynamo)61867.60 ratio:1.0529 iters=100 bsz=16 avg time compare:64421.50 -- (dynamo)54427.80 ratio:1.1836 iters=200 bsz=16 avg time compare:64340.89 -- (dynamo)50062.75 ratio:1.2852
I am not sure about what the docker does. This is also on A100 GPUs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
The backends are targeted at Volta and later GPUs, so yeah, if you do get improvements on TITAN Xp it would be a pleasant surprise, but we are not specifically targeting it.
@TraceCS so may we close this issue?
sorry for the delay and yes, thanks for asking.
---Original--- From: "Jack @.> Date: Wed, Aug 31, 2022 12:51 PM To: @.>; Cc: @.**@.>; Subject: Re: [pytorch/torchdynamo] Problems of using the acceleration oftorchdynamo for Resnet when training (Issue #694)
@TraceCS so may we close this issue?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
We are encountering problems of using torchdynamo to accelerate the training process, our model and resnet are very alike, so we first ran tests on resnet152. here is the testing code.
The results are: iters=20 bsz=16 avg time compare:209189.15 -- (dynamo)240762.80 ratio:0.8689 iters=50 bsz=16 avg time compare:189001.68 -- (dynamo)198850.06 ratio:0.9505 iters=100 bsz=16 avg time compare:189400.90 -- (dynamo)204025.96 ratio:0.9283 iters=200 bsz=16 avg time compare:191534.67 -- (dynamo)195713.67 ratio:0.9786
The torch and corresponding modules we use are: torch 1.13.0.dev20220801+cu113 torch-struct 0.5 torchaudio 0.12.0 torchdynamo 1.13.0.dev0 /data/dev/torchdynamo torchfile 0.1.0 torchmetrics 0.9.3 torchrec-nightly 2022.8.1 torchstat 0.0.7 torchtext 0.13.0 torchvision 0.13.0 torchx-nightly 2022.8.1
we ran this test on docker with NVIDIA TITAN Xp Are there any mistakes in our usage ? We want to know if we are not using it in a proper way or how to use torchdynamo to speed up training process.