undertherain / benchmarker

modular framework for [not only] deep learning performance benchmarking
http://blackbird.pw/performance
Mozilla Public License 2.0
9 stars 5 forks source link

Torchprof #183

Closed undertherain closed 2 years ago

undertherain commented 2 years ago

use torch's internal profiler

vatai commented 2 years ago

Tested. See output below!

If newer python is installed generates:

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                           aten::conv2d         0.04%       1.088ms        98.57%        2.497s       5.890ms       0.000us         0.00%     142.900ms     337.028us           424  
                                      aten::convolution         0.05%       1.222ms        98.52%        2.496s       5.888ms       0.000us         0.00%     142.900ms     337.028us           424  
                                     aten::_convolution         0.08%       2.051ms        98.48%        2.495s       5.885ms       0.000us         0.00%     142.900ms     337.028us           424  
                                aten::cudnn_convolution         1.83%      46.312ms        98.40%        2.493s       5.880ms     142.900ms        87.49%     142.900ms     337.028us           424  
                                        cudaMemsetAsync        87.10%        2.207s        87.10%        2.207s      17.798ms       0.000us         0.00%       0.000us       0.000us           124  
                                   cudaEventSynchronize         3.96%     100.362ms         3.96%     100.362ms     737.956us       0.000us         0.00%       0.000us       0.000us           136  
                                             cudaMalloc         3.75%      95.117ms         3.75%      95.117ms       3.171ms       0.000us         0.00%       0.000us       0.000us            30  

On F (or systems with older PyTorch):

$ head logs/inference/resnet50/unknown_CPU/pytorch_21.09.17_15.45.50.profile
{
    "ClassifierInference": {
        "net": {
            "conv1": {
                "null": {
                    "self_cpu_total": 45617.25099999998,
                    "cpu_total": 273811.58600000007,
                    "cuda_total": 0,
                    "occurrences": 1,
                    "param": "Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)"