Open rajrana22 opened 1 year ago
Hi @rajrana22,
Some of the files you listed like ec_base_aliases.c and ec_base_vxs.c would only run on a system with no arch-specific optimizations or no modern instruction sets. This is likely why your print statements are not called. Other versions do split up operations for balancing calculation and what temp can be kept in registers. For example we may opt to load sources and calculate 6 parity at a time before looping through sources for the next 6 parity calculations. Within the inner loop, loads from each source really only "slice" based on the size of vector registers. In the file gf_6vect_dot_prod_avx512.asm
you can see we load 64 bytes at a time. Other slicing or blocking is really up to the user as they can send in chunks as they see benefit. The term slice is usually used in RAID or EC for how a single source is split into sources and this is done before passing to ISA-L EC functions.
Hi @gbtucker, Thank you for your reply! That was very helpful for me.
Unfortunately, I am not very experienced with assembly, so I was wondering if you could help answer some other questions that I have, which are particular to the erasure coding assembly files:
For reference, I am looking specifically at the gf_2vect_dot_prod_avx512.asm
file.
func(gf_2vect_dot_prod_avx512)
? What instruction does it execute after mov dest1, [dest1]
? How does it get into the .next_vect
loop?.next_vect
loop do? Specifically, I'm confused about what all of the masks and nibbles are and what they do.ptr
, vec_i
, dest2
, and pos
have, and specifically what they are for.
- What is the control flow of
func(gf_2vect_dot_prod_avx512)
? What instruction does it execute aftermov dest1, [dest1]
? How does it get into the.next_vect
loop?
The vpxorq xp1, xp1, xp1
instruction just zeroes out the first accumulator. After that it falls through from the outer loop to the inner loop.
- What exactly does the bulk of the
.next_vect
loop do? Specifically, I'm confused about what all of the masks and nibbles are and what they do.
The flow is simply two loops. The inner loop .next_vect
goes through each coefficient and source to multiply and accumulate. The outer loop writes out the parity and resets for next inner loop.
- Upon calling the assembly file, apart from the function arguments, what values are loaded into registers? I'm a bit unsure about what values
ptr
,vec_i
,dest2
, andpos
have, and specifically what they are for.
Only function arguments are passed to these functions. The ptr, vec_i, dest2, and pos are temporary variables to help index the proper offsets into the arrays passed to the functions.
- Overall, I think a high-level understanding of how the assembly files work will help me out a lot, because I'm trying to make modifications to them for my purposes.
I hope this helps.
I'm trying to go through the call stack of
ec_encode_data
usingprintf
statements as the function is being called in erasure_code_perf.c.I believe that the only two places where
ec_encode_data
is defined are in ec_base_aliases.c and ec_base_vsx.c. I wroteprintf
statements under both definitions to see where the entry point of the function is, but when I ranmake perfs
, neitherprintf
statements were outputted. Am I doing something wrong or thinking about the function incorrectly?What I'm ultimately trying to do is find some sort of "slice" size. The concept of slicing was mentioned in the Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage paper, specifically in section 4.2, which states that "current encoding implementation (e.g., Intel ISA-L [4] and QFS [49]) often splits data chunks of large size (e.g., 64 MiB) into smaller-size data slices and performs slice-based encoding with hardware acceleration (e.g., Intel ISA-L) or parallelism (e.g., QFS)". I'm trying to find where in the ISA-L code this slicing occurs, and more specifically, what the slice size is.
Any help would be greatly appreciated!