Time-consuming differences in different environments

v4if commented 4 days ago

🐛 Describe the bug

from torchvision.transforms import transforms
import torch
x_cpu=torch.randn(3,1024,1024)
x_gpu=x_cpu.to("cuda")
resize_trans=transforms.Resize(size=(512,512),interpolation=transforms.InterpolationMode.BILINEAR, antialias=None)

%timeit resize_trans(x_cpu) %timeit resize_trans(x_gpu)

The difference in CPU time consumption under different environments is 2.65 times. 11.3ms vs 30ms.

venv1:

venv2:

Versions

venv1

[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnxruntime-gpu==1.19.2
[pip3] torch==2.5.0
[pip3] torchvision==0.20.0
[pip3] triton==3.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] torch                     2.5.0                    pypi_0    pypi
[conda] torchvision               0.20.0                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi

venv2

[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnxruntime-gpu==1.19.2
[pip3] torch==2.5.0
[pip3] torchaudio==2.5.0
[pip3] torchvision==0.20.0
[pip3] triton==3.1.0
[conda] Could not collect

NicolasHug commented 4 days ago

Hi @v4if you should make sure to call https://pytorch.org/docs/stable/generated/torch.cuda.synchronize.html when running gpu benchmarks. I don't think %timeit does that for you.

v4if commented 4 days ago

Hi @v4if you should make sure to call https://pytorch.org/docs/stable/generated/torch.cuda.synchronize.html when running gpu benchmarks. I don't think %timeit does that for you.

There is a difference in the time on the GPU with and without sync.

But my question is why the difference in CPU time consumption in different environments is so big.

abhi-glitchhg commented 4 days ago

this is indeed weird. forgive me for these silly questions, are venv1 and venv2 in the same system?? (asking because the terminal font style looks different).

v4if commented 3 days ago

this is indeed weird. forgive me for these silly questions, Does venv1 and venv2 are in the same system?? (asking because the terminal font style looks different).

It is on two machines, but the installed torchvision version 0.20.0 is the same. Don’t know why the time-consuming difference on the CPU is so big.

abhi-glitchhg commented 3 days ago

It is on two machines, but the installed torchvision version 0.20.0 is the same. Don’t know why the time-consuming difference on the CPU is so big.

If you are running code on different system then it is expected. Depending on the number of cores, RAM and the type of CPU; you will have different speeds.

I have intel i7 11th gen with 16GB of ram and here's my benchmarking.

v4if commented 3 days ago

It is on two machines, but the installed torchvision version 0.20.0 is the same. Don’t know why the time-consuming difference on the CPU is so big.

If you are running code on different system then it is expected. Depending on the number of cores, RAM and the type of CPU; you will have different speeds.

I have intel i7 11th gen with 16GB of ram and here's my benchmarking.

The number of cores, RAM and the type of CPU, what are the main factors that determine the execution speed? Due to the existence of GIL, python should use a single core, or multi-core will be used in the torchvision implementation? The CPU frequency of the above two machines They are all around 3000MHz, why is there a gap of several times. cat /proc/cpuinfo |grep MHz|uniq

Moreover, the execution speed on my local mac is us level, which is several orders of magnitude faster than the above two server machines.

NicolasHug commented 3 days ago

@v4if It's expected to see different performancec on different machines. Some ops (like resize) leverage SIMD operation. E.g. if one of your machine has AVX2 while the other one doesn't, you'll see massive differences.

I don't think this issue is really in scope for torchvision (especially not with that level of details), so I'll close the issue.

abhi-glitchhg commented 3 days ago

Thanks Nicolas for clarification.

@v4if This website has benchmarking for pytorch against different hardware.

https://openbenchmarking.org/test/pts/pytorch

On Tue, 29 Oct 2024, 16:26 Nicolas Hug, @.***> wrote:

@v4if https://github.com/v4if It's expected to see different performancec on different machines. Some ops (like resize) leverage SIMD operation. E.g. if one of your machine has AVX2 while the other one doesn't, you'll see massive differences.

I don't think this issue is really in scope for torchvision (especially not with that level of details), so I'll close the issue.

— Reply to this email directly, view it on GitHub https://github.com/pytorch/vision/issues/8700#issuecomment-2443890221, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARLRQF7UVWSEVVG25M56LLTZ55SXFAVCNFSM6AAAAABQWYJUS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBTHA4TAMRSGE . You are receiving this because you commented.Message ID: @.***>

-- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.

pytorch / vision

Time-consuming differences in different environments #8700

🐛 Describe the bug

Versions