travisdowns / uarch-bench

A benchmark for low-level CPU micro-architectural features
MIT License
686 stars 63 forks source link

No output when choosing a series of specific tasks #95

Closed moep0 closed 1 year ago

moep0 commented 1 year ago

I want to run a series of specific tasks of uarch. But it does not give any data instead of some system information. Or it just says that there is no such tasks. I can run sudo ./uarch-bench.sh and sudo ./uarch-bench.sh --test-name=memory/bandwidth/store/store-bandwidth-32b-128 , which output as normal.

I think I must do/type something wrong, but I don't know what exactly it is. This is what the output looks like when choosing a specific task

sudo ./uarch-bench.sh --test-name=studies/memory/l2-doubleload/*normal*          
zsh: no matches found: --test-name=studies/memory/l2-doubleload/*normal*

sudo ./uarch-bench.sh --test-name=studies/memory/l2-doubleload         
Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: clock
Welcome to uarch-bench ()
Supported CPU features: SSE3 PCLMULQDQ VMX SMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 AVX2 BMI2 ERMS MPX RDSEED ADX CLFLUSHOPT INTEL_PT
Pinned to CPU 2
Source pages allocated with transparent hugepages: 100.0
UARCH_BENCH_CLOCK_MHZ not set, running calibration
Median CPU speed: 2.896 GHz
Running benchmarks groups using timer clock
moep0 commented 1 year ago

And what is the use of the maker.template lines before the l2-doubleload bandwidth tests? Are they offering shuffle regions for the bandwidth tests? (maybe I should open another issue?)

travisdowns commented 1 year ago

Looks like a shell quoting issue: zsh (your shell) is interpreting the *, not uarch-bench. Try it like:

sudo ./uarch-bench.sh '--test-name=studies/memory/l2-doubleload/*normal*'
travisdowns commented 1 year ago

And what is the use of the maker.template lines before the l2-doubleload bandwidth tests? Are they offering shuffle regions for the bandwidth tests? (maybe I should open another issue?)

Lines 422 through 434 are part of the l2-doubleload tests. Each line defines an individual test. Take this line for example:

        maker.template make<serial_double_load_oneload> ("single-load-16k", "Just one load 16k region",  1, []{ return &shuffled_region(16 * 1024); });

This defines a test called "single-load-16k" which calls the function serial_double_load_oneload (written in asm) which is passed a "shuffled region" of 16 KiB (see the comments for "shuffled region" for what that is). The way the passed-in argument is used depends on the specific benchmark. This one does:

; one load only, as a baseline
define_bench serial_double_load_oneload
mov     rsi, [rsi + region.start]
.top:
mov rcx, [rsi]
mov rsi, rcx
dec rdi
jnz .top
ret

I.e., it "pointer chases" across the region in a loop.

Did that answer your question?

moep0 commented 1 year ago

Looks like a shell quoting issue: zsh (your shell) is interpreting the *, not uarch-bench. Try it like:

sudo ./uarch-bench.sh '--test-name=studies/memory/l2-doubleload/*normal*'

This works! Thank you!

Lines 422 through 434 are part of the l2-doubleload tests. Each line defines an individual test.

I got it. Can I ask one more question? I notice that the region is defined both in cpp and asm.

struc region
    .size  : resq 1
    .start : resq 1
endstruc
struct region {
    size_t size;
    void *start;  // actually a CacheLine object
};

I think instructions like mov rsi, [rsi + region.start] are using the asm definition. After executing shuffled_region in cpp and getting an address, how to pass the cpp struct data to the asm struct?

travisdowns commented 1 year ago

I got it. Can I ask one more question?

Yes, of course, though in general I like separate issues for new questions.

I think instructions like mov rsi, [rsi + region.start] are using the asm definition.

That's correct, it's using nasm STRUC support described here.

After executing shuffled_region in cpp and getting an address, how to pass the cpp struct data to the asm struct?

The 4th argument to maker.make takes a function that returns a void *, and this is passed to the asm where it is available in the rsi register. The rdi register contains the number of loops requested by the C++ code. A function is used instead of passing a void * directly in C++ to avoid running all the code for every test when the tests are just being registers: this lets us register the tests while referring the generation of their argument.

Does it make sense?

moep0 commented 1 year ago

Sorry I'm poor at c++. I am a little bit confused. How does the void * pointer pass its value to the rsi register? And how does the rsi register pass it to region.start?

travisdowns commented 1 year ago

I don't think I can cover all the background required to understand this in a GitHub issue. I'm already linked to nasm STRUC doc, which is part of the puzzle, but you'll have to learn about x64 ABI and C++ for the rest.

In my experience you can get good answers on StackOverflow.

travisdowns commented 1 year ago

Look up the SysV ABI to see how function parameters (in this case a void pointer) are passed to x86 registers. The second parameter goes in rsi by this convention.

moep0 commented 1 year ago

Thank you for all the answers! Maybe I don't express clearly. I do know that the first one goes to rdi and second one to rsi and so on. I will try to read nasm STRUC doc to see how the two structs are linked.

travisdowns commented 1 year ago

The way it works is you define a region structure in both C++ and asm, then the benchmark framework passes a pointer to a region structure to the asm benchmark routine in rsi. Then something like [rsi + region.size] is just a way of saying [rsi + 8] (assuming size member is at offset 8 in the structure) and which means "access the value pointed to by rsi + 8 bytes).

On Fri., Nov. 18, 2022, 7:22 p.m. moep0, @.***> wrote:

Thank you for all the answers! Maybe I don't express clearly. I do know that the first one goes to rdi and second one to rsi and so on. I will try to read nasm STRUC doc to see how the two structs are linked.

— Reply to this email directly, view it on GitHub https://github.com/travisdowns/uarch-bench/issues/95#issuecomment-1320775928, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQLB32QP67PEUOGZCSTWJBBWPANCNFSM6AAAAAASEANO7Y . You are receiving this because you commented.Message ID: @.***>

moep0 commented 1 year ago

OK, totally got it. region.size and region.start are just offsets. Thanks for your patience and sorry for the trouble. Next time I will read the background first and then ask questions.

travisdowns commented 1 year ago

It's no problem at all.

Yes, NASM STRUC feature is basically a shorthand notation for offsets within a structure.

On Fri., Nov. 18, 2022, 7:34 p.m. moep0, @.***> wrote:

OK, totally got it. region.size and region.start are just offsets. Thanks for your patience and sorry for the trouble. Next time I will read the background first and then ask questions.

— Reply to this email directly, view it on GitHub https://github.com/travisdowns/uarch-bench/issues/95#issuecomment-1320779270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQLD2PQEMJFH62SO5ATWJBDD3ANCNFSM6AAAAAASEANO7Y . You are receiving this because you commented.Message ID: @.***>