undertherain / benchmarker

modular framework for [not only] deep learning performance benchmarking
http://blackbird.pw/performance
Mozilla Public License 2.0
9 stars 5 forks source link

perf flops do not measure correctly on Ryzen CPU #145

Open undertherain opened 3 years ago

undertherain commented 3 years ago

did not look at the details yet same command line records correct flops on Xeon

not sure if AMD-specific or my system-sepcific done snot crash - just reports very low number

undertherain commented 3 years ago

confirmed on Epyc

ython3 -m benchmarker --framework=pytorch --problem=bert_custom --problem_size=32,128 --cnt_units=768 --batch_size=32 --cnt_heads=12 --cnt_layers=12 --mode=inference --flops
        "gflop_estimated": 7151.120547840001,
        "gflop_measured": 0.000158806,
        "len_sequence": 128,
shwetasalaria commented 3 years ago

There seem to be no event to record floating-point operations. I checked the hardware event on epyc, which is, RETIRED_MMX_FP_INSTRUCTIONS but there is no unit mask to select FP instructions explicitly.

IDX : 933232654 PMU name : amd64_fam17h_zen2 (AMD64 Fam17h Zen2) Name : RETIRED_MMX_FP_INSTRUCTIONS Equiv : None Flags : None Desc : Number of MMX, SSE or x87 instructions retired. The UnitMask allows the selection of the individual classes of instructions as given in the table. Each increment represents one complete instruction. Since this event includes non-numeric instructions, it is not suitable for measuring MFLOPS. Code : 0xcb Umask-00 : 0x04 : PMU : [SSE_INSTR] : None : Number of SSE instructions (SSE, SSE2, SSE3, SSE$, SSE4A, SSE41, SSE42, AVX). Umask-01 : 0x02 : PMU : [MMX_INSTR] : None : Number of MMX instructions. Umask-02 : 0x01 : PMU : [X87_INSTR] : None : Number of X87 instructions. Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)