travisdowns / freq-bench

Fine-grained frequency and voltage transition tests
19 stars 4 forks source link

Question about the "MHz" value. #3

Open edisonchan opened 3 years ago

edisonchan commented 3 years ago

I have set the frequency of all cores to min=5100MHz and max=5100MHz at performance mode with the tool cpupower-gui. Now the frequency of all the cores is show about 5100MHz in the tool s-tui. And then I ran: export MHZ=5100

./bench is OK. scripts/data.sh is OK ( I need to change the EOL of all files inside the scripts folder to unix mode). scripts/polt.sh is OK, but the max y-axis value or frequency value of the output diagrams is too low in my case.

无标题 无标题

how can I fix this problem?

travisdowns commented 3 years ago

I have set the frequency of all cores to min=5100MHz and max=5100MHz at performance mode with the tool cpupower-gui. Now the frequency of all the cores is show about 5100MHz in the tool s-tui.

Thanks for telling me about s-tui, I hadn't seen it before.

( I need to change the EOL of all files inside the scripts folder to unix mode).

That is strange: all the files in there already have unix line endings, as far as I can tell (as reported by file and I double checked by setting core.autocrlf to false and cloning the repo again). Are you involving Windows at any point, or are you running this only from Linux?

how can I fix this problem?

Right, I've tweaked the y axis limits to be appropriate for the data I collected on my machine, centered around 2.6 GHz.

You can adjust the ylimits for each plot in plots.sh, by changing the --ylim argument for each plot. Look for the lines starting with plot near the end of the line, like this one:

plot "$PREFIX-vporymm_vz-{0..2}.csv" "fig-vporvz256" "Frequency (GHz)" "Time (us)" "256-bit VPOR Frequency Transitions" \
   $xcols_arg $ycols_arg --ylim 0 4 --alpha 0.6

There you can see the --ylim 0 4 argument, meaning the y axis is set to 0 and 4 GHz as the min and max points. You can adjust this as you see fit, e.g., --ylim 0 6.5 to go up to 6.5 GHz (floating point values are fine).

Let me know if anything is unclear!

edisonchan commented 3 years ago

Thanks for your reply. my results: https://drive.google.com/drive/folders/1XKte5dmRAAciiNhI22xwR8sVST_XeUku?usp=sharing

The clock numbers(Unhalt_GHz) seems not right here ?

travisdowns commented 3 years ago

The clock numbers(Unhalt_GHz) seems not right here ?

Yes, they look wrong. The problem is the MHZ variable. It is should actually be TSC frequency not CPU frequency (in my case they are the same, but on new Ice Lake and Rocket Lake CPUs they are quite different). We can fix it, though!

Can you past the output of (make sure MHZ is not set in the environment):

./bench dummy | head -30

?

We want to see the tsc freq : ??? MHz line.

edisonchan commented 3 years ago

The TSC value is 3504MHz from ./bench dummy |& grep 'tsc freq' .

without set min5100/max5100 performance mode (run at powersave mode min is 800,mhz max is 5100mhz):

./bench dummy | head -30

Found 7 normal columns and 0 post-output columns Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=inst_retired.any,event_string=cpu/event=0xc0,umask=0x0/] Adding event event[name=inst_retired.any,event_string=cpu/event=0xc0,umask=0x0/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=uops_issued.any,event_string=cpu/event=0xe,umask=0x1/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Resolved and programmed event 'cpu_clk_unhalted.thread' to cpu/config=0x3c/ R1 UT1 ZT1 index: 0x2 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x34e32 time_running=0x34e32 rdtsc=0x7b60e3369b Resolved and programmed event 'inst_retired.any' to cpu/config=0xc0/ R1 UT1 ZT1 index: 0x40000001 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x1308d time_running=0x1308d rdtsc=0x7b60f051a3 Resolved and programmed event 'uops_issued.any' to cpu/config=0x10e/ R1 UT1 ZT1 index: 0x1 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x2abe time_running=0x2abe rdtsc=0x7b60fbd0af EventManager configured 3 events inner loops : 100 pinned cpu : 0 current cpu : 0 start size : 262144 bytes stop size : 20971520 bytes inc size : 262144 bytes tsc freq : 3504.0 MHz test period : 28538.813 us duty period : 2853.881 us resolution : 2.854 us payload extra: 0.000 us warmup stamp : yes About to run 1 tests with 7 columns (after 6 ms of startup time) repeat,us,period,sdl,payspin,totspin,paytime,tsc-delta,nanos,Cycles,INSTRU,IPC,UPC,Unhalt_GHz 0,5.711,0,20000,0,478,0,9968,2844.749,13376,6095,0.456,1.247,4.702 0,8.562,0,30000,0,485,0,9992,2851.598,13400,6194,0.462,1.244,4.699 0,11.418,0,40000,0,486,0,10008,2856.164,13428,6198,0.462,1.244,4.701 0,14.269,0,50000,0,485,0,9988,2850.457,13398,6194,0.462,1.244,4.700 0,17.126,0,60000,0,486,0,10010,2856.735,13425,6206,0.462,1.244,4.699 0,19.982,0,70000,0,486,0,10008,2856.164,13425,6206,0.462,1.244,4.700 0,22.833,0,80000,0,485,0,9990,2851.027,13398,6202,0.463,1.244,4.699 0,25.689,0,90000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,28.540,0,100000,0,485,0,9988,2850.457,13398,6194,0.462,1.244,4.700 0,31.397,0,110000,0,486,0,10010,2856.735,13424,6206,0.462,1.244,4.699 0,34.247,0,120000,0,485,0,9988,2850.457,13398,6194,0.462,1.244,4.700 0,37.103,0,130000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,39.960,0,140000,0,486,0,10012,2857.306,13424,6206,0.462,1.244,4.698 0,42.811,0,150000,0,485,0,9988,2850.457,13398,6194,0.462,1.244,4.700 0,45.667,0,160000,0,486,0,10006,2855.594,13424,6206,0.462,1.244,4.701 0,48.517,0,170000,0,485,0,9990,2851.027,13398,6194,0.462,1.244,4.699 0,51.373,0,180000,0,486,0,10006,2855.594,13424,6206,0.462,1.244,4.701 0,54.224,0,190000,0,485,0,9990,2851.027,13398,6194,0.462,1.244,4.699 0,57.080,0,200000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,59.937,0,210000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,62.787,0,220000,0,485,0,9990,2851.027,13398,6194,0.462,1.244,4.699 0,65.643,0,230000,0,486,0,10006,2855.594,13424,6206,0.462,1.244,4.701 0,68.494,0,240000,0,485,0,9990,2851.027,13398,6194,0.462,1.244,4.699 0,71.350,0,250000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,74.207,0,260000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,77.056,0,270000,0,483,0,9984,2849.315,13393,6170,0.461,1.239,4.700 0,79.912,0,280000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700 0,82.763,0,290000,0,485,0,9988,2850.457,13398,6194,0.462,1.244,4.700 0,85.619,0,300000,0,486,0,10008,2856.164,13424,6206,0.462,1.244,4.700

with set all cores to min5100/max5100 at performance mode: Found 7 normal columns and 0 post-output columns Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=inst_retired.any,event_string=cpu/event=0xc0,umask=0x0/] Adding event event[name=inst_retired.any,event_string=cpu/event=0xc0,umask=0x0/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=uops_issued.any,event_string=cpu/event=0xe,umask=0x1/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Adding event event[name=cpu_clk_unhalted.thread,event_string=cpu/event=0x3c,umask=0x0/] Resolved and programmed event 'cpu_clk_unhalted.thread' to cpu/config=0x3c/ R1 UT1 ZT1 index: 0x2 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x8eeb time_running=0x8eeb rdtsc=0x1115d583193 Resolved and programmed event 'inst_retired.any' to cpu/config=0xc0/ R1 UT1 ZT1 index: 0x40000001 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x32a1 time_running=0x32a1 rdtsc=0x1115d5a3381 Resolved and programmed event 'uops_issued.any' to cpu/config=0x10e/ R1 UT1 ZT1 index: 0x1 pmc_width=0x30 offset=0x7fffffffffff time_enabled=0x7ad time_running=0x7ad rdtsc=0x1115d5bfea7 EventManager configured 3 events inner loops : 100 pinned cpu : 0 current cpu : 0 start size : 262144 bytes stop size : 20971520 bytes inc size : 262144 bytes tsc freq : 3504.0 MHz test period : 28538.813 us duty period : 2853.881 us resolution : 2.854 us payload extra: 0.000 us warmup stamp : yes About to run 1 tests with 7 columns (after 1 ms of startup time) repeat,us,period,sdl,payspin,totspin,paytime,tsc-delta,nanos,Cycles,INSTRU,IPC,UPC,Unhalt_GHz 0,5.711,0,20000,0,521,0,9984,2849.315,14540,6617,0.455,1.243,5.103 0,8.563,0,30000,0,526,0,10000,2853.881,14546,6695,0.460,1.242,5.097 0,11.419,0,40000,0,528,0,10004,2855.023,14564,6710,0.461,1.240,5.101 0,14.269,0,50000,0,527,0,9990,2851.027,14537,6698,0.461,1.240,5.099 0,17.126,0,60000,0,528,0,10006,2855.594,14563,6710,0.461,1.240,5.100 0,19.981,0,70000,0,528,0,10006,2855.594,14564,6710,0.461,1.240,5.100 0,22.832,0,80000,0,527,0,9990,2851.027,14538,6701,0.461,1.240,5.099 0,25.687,0,90000,0,528,0,10002,2854.452,14565,6698,0.460,1.241,5.103 0,28.542,0,100000,0,528,0,10000,2853.881,14555,6710,0.461,1.243,5.100 0,31.396,0,110000,0,528,0,10004,2855.023,14552,6725,0.462,1.243,5.097 0,34.251,0,120000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,37.100,0,130000,0,527,0,9986,2849.886,14535,6698,0.461,1.242,5.100 0,39.956,0,140000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,42.811,0,150000,0,528,0,10006,2855.594,14561,6710,0.461,1.242,5.099 0,45.667,0,160000,0,528,0,10002,2854.452,14561,6710,0.461,1.242,5.101 0,48.517,0,170000,0,527,0,9990,2851.027,14535,6698,0.461,1.242,5.098 0,51.372,0,180000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,54.227,0,190000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,57.083,0,200000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,59.933,0,210000,0,527,0,9990,2851.027,14535,6698,0.461,1.242,5.098 0,62.788,0,220000,0,528,0,10002,2854.452,14561,6710,0.461,1.242,5.101 0,65.642,0,230000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,68.498,0,240000,0,528,0,10002,2854.452,14561,6710,0.461,1.242,5.101 0,71.347,0,250000,0,527,0,9988,2850.457,14535,6698,0.461,1.242,5.099 0,74.203,0,260000,0,528,0,10002,2854.452,14561,6710,0.461,1.242,5.101 0,77.056,0,270000,0,526,0,10002,2854.452,14556,6686,0.459,1.238,5.099 0,79.912,0,280000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,82.767,0,290000,0,528,0,10004,2855.023,14561,6710,0.461,1.242,5.100 0,85.617,0,300000,0,527,0,9988,2850.457,14535,6698,0.461,1.242,5.099

travisdowns commented 3 years ago

@edisonchan - those frequency values (last column) look much better.

Try to collect your results again with MHZ=3504 and you should get a better result.

This whole rigamarole is because the data.sh script itself wants to know the MHZ value to calculate some test parameters (like the period) which have units of "tsc cycles": since the test wants to use real time like "1 us" it needs MHZ to convert. The program itself can determine MHZ just fine, but by then the script has already run. Perhaps the script should just query the program to get the MHZ value if not specified.

I'm interested to see your updated results!

edisonchan commented 3 years ago

https://drive.google.com/drive/folders/1v1f9Cxzuw-xL-ffHkF2sBaAa5A5n95ou?usp=sharing

It seems there is not transitions (cpupower-gui: all cores set to min5100/max5100 at performance mode) ?

update: I had tried with min800/max5100 at performance mode, there is not big difference here.

travisdowns commented 3 years ago

That's interesting. Note that these tests assume a transition for "light" AVX-512 instructions, but as I found on Ice Lake, light AVX-512 may not cause a frequency transition at all anymore (there may be a voltage transition, however).

If you want to see the max turbo frequencies across various instruction types and thread counts, I have another benchmark which is more suited:

https://github.com/travisdowns/avx-turbo

With that, you can map out the frequency behavior for all the "license" levels and active core counts (I'd be interested in your results).

edisonchan commented 3 years ago

ya, here is my results: https://drive.google.com/file/d/1NSHa9QCQdis95hP14vhPNmNnpBdqmw7x/view?usp=sharing

travisdowns commented 3 years ago

ya, here is my results: https://drive.google.com/file/d/1NSHa9QCQdis95hP14vhPNmNnpBdqmw7x/view?usp=sharing

Thank you!

Based on that, I'd say this RKL chip basically has no "license" based downclocking: it runs at 5.1 GHz for 1 core regardless of AVX, AVX-512 instructions, including heavy instructions. Similarly for higher core counts: the frequency is lower when all 8 cores are running at 4.8 GHz, but it doesn't matter what instructions are being used. There are a few cases where one test runs slower than the surrounding ones (e.g., at 3 cores, avx256_fma_t runs at 4.9 GHz, slower than most other tests, including avx512_fma_t which run at 5.1 GHz, but these seem to be outliers, perhaps power or thermal related throttling).

I would like to add a note to the Ice Lake blog article about this behavior: would you allow me to use your results, with credit?

edisonchan commented 3 years ago

of course, you can use the results.

travisdowns commented 3 years ago

of course, you can use the results.

Thanks! I added a section to this blog post based on your results, and credited you. I parsed your name as "Edison Chan" based on your user name, let me know if that is not correct.

One think I noticed is that there aren't any results > 5.1 GHz, but the 11900K should go to 5.2 GHz or 5.3 GHz if "turbo boost 3.0" or "thermal velocity boost" is available. These are only available on certain "selected cores" on each CPU, so maybe the issue is that the test just doesn't run on those cores.

If you are curious, you could run this:

for cpu in {0..7}; do taskset -c $cpu sudo ./avx-turbo --test=scalar_iadd,avx256_fma_t,avx512_fma_t --max-threads=1 --no-pin; sleep 2s; done

which runs the single-threaded test (just three of the more interesting ones) on each core to see if some have higher frequencies. I'm interested in your result if you try it.

Thanks again Edison.

edisonchan commented 3 years ago

The system was connected via ssh, not on my hand, It had updated to last bios.

It's using Ubuntu 20.04.2 LTS (GNU/Linux 5.11.7-051107-generic x86_64). I don't know how to make it run beyond 5.1GHz in linux(the max freq is up to 5.1GHz in cpupower-gui).

5.3GHz is happen in Windows when there is only one thread running.

travisdowns commented 3 years ago

You shouldn't have to do anything special to make it run at 5.2 or 5.3 GHz, but the theory is it doesn't happen because the test pins to core 0 and this isn't a core eligible for TBM3 (turbo boost max 3).

That's why I suggested the recipe above with --no-pin which tries each core. If you try that it should get 5.3 on some cores. Give it a shot if you have a moment!

edisonchan commented 3 years ago

It works now:

https://drive.google.com/file/d/1Sz6EAvhrGMZHY6h_fPtexT94dy5ANCKo/view?usp=sharing

edisonchan commented 3 years ago

update. I run the avx-turbo with default seetings (just set the cpu power mode to performance in command line just like the last reply, no more use cpupower-gui that limit the max clock and maybe cause the iTVB/iAB not work): https://drive.google.com/file/d/1Mmj6nXbvkn9D7cdm4wUu14uS1TReZPaq/view?usp=sharing

travisdowns commented 3 years ago

The system was connected via ssh, not on my hand, It had updated to last bios.

Hi Edison - do you mean that since the earlier results and the most recent ones (from 2 days ago) the BIOS was updated? Because I do see a difference now: there is some AVX-512 downclocking, e.g., in the most recent results all the 512-bit tests run only at 5.0 GHz, unlike the earlier results where most ran at 5.1 GHz. This could also be due to the changes you made to no longer set the frequency explicitly, but I think BIOS is more likely since it's hard to see how not restricting the frequency would cause such a change.

I think this changes my conclusions and I'll need to update the post.

I'm interested in adding a bit of functionality to avx-turbo to test better the more advanced turbo modes: if I do that do you think you could run the process again?

Thanks again for all your help.

edisonchan commented 3 years ago

Yes. btw, is that possible add RAPL energy measurements ? The perf event does not included it for RKL yet. or Is there any other tool can do it?