Closed moep0 closed 1 year ago
And what is the use of the maker.template lines before the l2-doubleload
bandwidth tests? Are they offering shuffle regions for the bandwidth tests? (maybe I should open another issue?)
Looks like a shell quoting issue: zsh
(your shell) is interpreting the *
, not uarch-bench. Try it like:
sudo ./uarch-bench.sh '--test-name=studies/memory/l2-doubleload/*normal*'
And what is the use of the maker.template lines before the
l2-doubleload
bandwidth tests? Are they offering shuffle regions for the bandwidth tests? (maybe I should open another issue?)
Lines 422 through 434 are part of the l2-doubleload
tests. Each line defines an individual test. Take this line for example:
maker.template make<serial_double_load_oneload> ("single-load-16k", "Just one load 16k region", 1, []{ return &shuffled_region(16 * 1024); });
This defines a test called "single-load-16k" which calls the function serial_double_load_oneload
(written in asm) which is passed a "shuffled region" of 16 KiB (see the comments for "shuffled region" for what that is). The way the passed-in argument is used depends on the specific benchmark. This one does:
; one load only, as a baseline
define_bench serial_double_load_oneload
mov rsi, [rsi + region.start]
.top:
mov rcx, [rsi]
mov rsi, rcx
dec rdi
jnz .top
ret
I.e., it "pointer chases" across the region in a loop.
Did that answer your question?
Looks like a shell quoting issue:
zsh
(your shell) is interpreting the*
, not uarch-bench. Try it like:sudo ./uarch-bench.sh '--test-name=studies/memory/l2-doubleload/*normal*'
This works! Thank you!
Lines 422 through 434 are part of the l2-doubleload tests. Each line defines an individual test.
I got it. Can I ask one more question? I notice that the region
is defined both in cpp and asm.
struc region
.size : resq 1
.start : resq 1
endstruc
struct region {
size_t size;
void *start; // actually a CacheLine object
};
I think instructions like mov rsi, [rsi + region.start]
are using the asm definition. After executing shuffled_region
in cpp and getting an address, how to pass the cpp struct data to the asm struct?
I got it. Can I ask one more question?
Yes, of course, though in general I like separate issues for new questions.
I think instructions like mov rsi, [rsi + region.start] are using the asm definition.
That's correct, it's using nasm STRUC
support described here.
After executing shuffled_region in cpp and getting an address, how to pass the cpp struct data to the asm struct?
The 4th argument to maker.make
takes a function that returns a void *
, and this is passed to the asm where it is available in the rsi
register. The rdi
register contains the number of loops requested by the C++ code. A function is used instead of passing a void * directly in C++ to avoid running all the code for every test when the tests are just being registers: this lets us register the tests while referring the generation of their argument.
Does it make sense?
Sorry I'm poor at c++. I am a little bit confused. How does the void *
pointer pass its value to the rsi
register? And how does the rsi
register pass it to region.start
?
I don't think I can cover all the background required to understand this in a GitHub issue. I'm already linked to nasm STRUC doc, which is part of the puzzle, but you'll have to learn about x64 ABI and C++ for the rest.
In my experience you can get good answers on StackOverflow.
Look up the SysV ABI to see how function parameters (in this case a void pointer) are passed to x86 registers. The second parameter goes in rsi by this convention.
Thank you for all the answers! Maybe I don't express clearly. I do know that the first one goes to rdi
and second one to rsi
and so on. I will try to read nasm STRUC doc to see how the two structs are linked.
The way it works is you define a region structure in both C++ and asm, then the benchmark framework passes a pointer to a region structure to the asm benchmark routine in rsi. Then something like [rsi + region.size] is just a way of saying [rsi + 8] (assuming size member is at offset 8 in the structure) and which means "access the value pointed to by rsi + 8 bytes).
On Fri., Nov. 18, 2022, 7:22 p.m. moep0, @.***> wrote:
Thank you for all the answers! Maybe I don't express clearly. I do know that the first one goes to rdi and second one to rsi and so on. I will try to read nasm STRUC doc to see how the two structs are linked.
— Reply to this email directly, view it on GitHub https://github.com/travisdowns/uarch-bench/issues/95#issuecomment-1320775928, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQLB32QP67PEUOGZCSTWJBBWPANCNFSM6AAAAAASEANO7Y . You are receiving this because you commented.Message ID: @.***>
OK, totally got it. region.size
and region.start
are just offsets. Thanks for your patience and sorry for the trouble. Next time I will read the background first and then ask questions.
It's no problem at all.
Yes, NASM STRUC feature is basically a shorthand notation for offsets within a structure.
On Fri., Nov. 18, 2022, 7:34 p.m. moep0, @.***> wrote:
OK, totally got it. region.size and region.start are just offsets. Thanks for your patience and sorry for the trouble. Next time I will read the background first and then ask questions.
— Reply to this email directly, view it on GitHub https://github.com/travisdowns/uarch-bench/issues/95#issuecomment-1320779270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKZQLD2PQEMJFH62SO5ATWJBDD3ANCNFSM6AAAAAASEANO7Y . You are receiving this because you commented.Message ID: @.***>
I want to run a series of specific tasks of uarch. But it does not give any data instead of some system information. Or it just says that there is no such tasks. I can run
sudo ./uarch-bench.sh
andsudo ./uarch-bench.sh --test-name=memory/bandwidth/store/store-bandwidth-32b-128
, which output as normal.I think I must do/type something wrong, but I don't know what exactly it is. This is what the output looks like when choosing a specific task