intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

Demangle kernel names #13

Closed al42and closed 2 years ago

al42and commented 2 years ago

When tracing SYCL programs that use templating, the mangled kernel names are hard to work with. This patch adds a demangling step to make the output more readable:

Before:

== CL GPU Backend: ==

                                                                              Kernel,       Calls,           Time (ns),    Time (%),        Average (ns),            Min (ns),            Max (ns)
                     _ZTS11NbnxmKernelILb0ELb1ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE1EE,           3,             3940415,       20.01,             1313471,             1247083,             1422812
                     _ZTS11NbnxmKernelILb0ELb0ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE2EE,          75,             2839186,       14.41,               37855,               32239,               84114
                     _ZTS11NbnxmKernelILb0ELb0ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE1EE,           3,             2508957,       12.74,              836319,              812916,              856406
                                                    _ZTS20NbnxmKernelPruneOnlyILb1EE,          10,             2449787,       12.44,              244978,               19791,              771250
                                                                   _ZTS9__usmfillIfE,         810,             1308315,        6.64,                1615,                 625,               15989
                     _ZTS11NbnxmKernelILb0ELb1ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE2EE,          25,             1092074,        5.54,               43682,               37708,               53802
                     _ZTS11NbnxmKernelILb0ELb1ELN5Nbnxm8ElecTypeE4ELNS0_7VdwTypeE1EE,          36,              882950,        4.48,               24526,               21875,               28645
                     _ZTS11NbnxmKernelILb0ELb0ELN5Nbnxm8ElecTypeE1ELNS0_7VdwTypeE1EE,          30,              814103,        4.13,               27136,               23489,               41822
                 _ZTS15PmeGatherKernelILi4ELb1ELb1ELi1ELb1EL14ThreadsPerAtom1ELi16EE,          58,              810960,        4.12,               13982,                8906,              238072
_ZTS24PmeSplineAndSpreadKernelILi4ELb1ELb1ELb1ELb1ELi1ELb1EL14ThreadsPerAtom1ELi16EE,          58,              753306,        3.82,               12988,                9635,               53125
                     _ZTS11NbnxmKernelILb1ELb1ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE2EE,          10,              453746,        2.30,               45374,               41822,               48281
                     _ZTS11NbnxmKernelILb0ELb0ELN5Nbnxm8ElecTypeE4ELNS0_7VdwTypeE2EE,          10,              321349,        1.63,               32134,               29114,               43854
                     _ZTS11NbnxmKernelILb1ELb1ELN5Nbnxm8ElecTypeE1ELNS0_7VdwTypeE1EE,          10,              313121,        1.59,               31312,               27864,               39010
                                                    _ZTS20NbnxmKernelPruneOnlyILb0EE,           6,              244007,        1.24,               40667,                9270,              189322
                     _ZTS11NbnxmKernelILb1ELb1ELN5Nbnxm8ElecTypeE4ELNS0_7VdwTypeE1EE,           6,              181508,        0.92,               30251,               26666,               38229
                     _ZTS11NbnxmKernelILb1ELb1ELN5Nbnxm8ElecTypeE0ELNS0_7VdwTypeE1EE,           6,              179061,        0.91,               29843,               25833,               40677
                     _ZTS11NbnxmKernelILb1ELb1ELN5Nbnxm8ElecTypeE4ELNS0_7VdwTypeE2EE,           4,              175415,        0.89,               43853,               33333,               54479
                     _ZTS11NbnxmKernelILb0ELb1ELN5Nbnxm8ElecTypeE1ELNS0_7VdwTypeE1EE,           6,              170465,        0.87,               28410,               26093,               36562
                     _ZTS11NbnxmKernelILb0ELb1ELN5Nbnxm8ElecTypeE4ELNS0_7VdwTypeE2EE,           4,              144061,        0.73,               36015,               34062,               38229
                           _ZTSN2cl4sycl6detail19__pf_kernel_wrapperI9__usmfillIfEEE,          38,               99618,        0.51,                2621,                1562,                5937
                                                                   _ZTS11DummyKernel,           8,               13694,        0.07,                1711,                1302,                2447

After:

== CL GPU Backend: ==

                                                                             Kernel,       Calls,           Time (ns),    Time (%),        Average (ns),            Min (ns),            Max (ns)
                    NbnxmKernel<false, true, (Nbnxm::ElecType)0, (Nbnxm::VdwType)1>,           3,             3923280,       20.51,             1307760,             1283072,             1338958
                   NbnxmKernel<false, false, (Nbnxm::ElecType)0, (Nbnxm::VdwType)2>,          75,             2799759,       14.64,               37330,               32031,               54427
                   NbnxmKernel<false, false, (Nbnxm::ElecType)0, (Nbnxm::VdwType)1>,           3,             2620259,       13.70,              873419,              827343,              937812
                                                         NbnxmKernelPruneOnly<true>,          10,             1958642,       10.24,              195864,               21666,              602864
                                                                   __usmfill<float>,         810,             1258858,        6.58,                1554,                 625,               16510
                    NbnxmKernel<false, true, (Nbnxm::ElecType)0, (Nbnxm::VdwType)2>,          25,             1206445,        6.31,               48257,               37916,              159895
                    NbnxmKernel<false, true, (Nbnxm::ElecType)4, (Nbnxm::VdwType)1>,          36,              905712,        4.73,               25158,               23437,               30989
                   NbnxmKernel<false, false, (Nbnxm::ElecType)1, (Nbnxm::VdwType)1>,          30,              885248,        4.63,               29508,               24427,               42395
PmeSplineAndSpreadKernel<4, true, true, true, true, 1, true, (ThreadsPerAtom)1, 16>,          58,              721535,        3.77,               12440,               10000,               19947
                     PmeGatherKernel<4, true, true, 1, true, (ThreadsPerAtom)1, 16>,          58,              587678,        3.07,               10132,                8541,               12760
                     NbnxmKernel<true, true, (Nbnxm::ElecType)0, (Nbnxm::VdwType)2>,          10,              468432,        2.45,               46843,               43906,               55625
                     NbnxmKernel<true, true, (Nbnxm::ElecType)1, (Nbnxm::VdwType)1>,          10,              329735,        1.72,               32973,               30260,               42812
                   NbnxmKernel<false, false, (Nbnxm::ElecType)4, (Nbnxm::VdwType)2>,          10,              307393,        1.61,               30739,               29531,               31979
                                                        NbnxmKernelPruneOnly<false>,           6,              249215,        1.30,               41535,                9218,              181354
                    NbnxmKernel<false, true, (Nbnxm::ElecType)1, (Nbnxm::VdwType)1>,           6,              184529,        0.96,               30754,               27968,               33229
                     NbnxmKernel<true, true, (Nbnxm::ElecType)4, (Nbnxm::VdwType)1>,           6,              168487,        0.88,               28081,               24375,               33020
                     NbnxmKernel<true, true, (Nbnxm::ElecType)0, (Nbnxm::VdwType)1>,           6,              155726,        0.81,               25954,               25052,               27760
                     NbnxmKernel<true, true, (Nbnxm::ElecType)4, (Nbnxm::VdwType)2>,           4,              145050,        0.76,               36262,               34739,               37812
                    NbnxmKernel<false, true, (Nbnxm::ElecType)4, (Nbnxm::VdwType)2>,           4,              140831,        0.74,               35207,               32864,               38802
                           cl::sycl::detail::__pf_kernel_wrapper<__usmfill<float> >,          38,               99097,        0.52,                2607,                1458,                6250
                                                                        DummyKernel,           8,               12445,        0.07,                1555,                1354,                1979
anton-v-gorshkov commented 2 years ago

Thanks, Alexey! I believe demangling is something really helpful for PTI tools, the only thought is that demangled names are not always better, e.g. for unnamed lambdas mangled name is much shorter. So let me accept your patch and move this functionality under a new "--demangle" option.