intel / isa-l

Intelligent Storage Acceleration Library
Other
953 stars 299 forks source link

How to go through the call stack of ec_encode_data (in order to find slice size)? #228

Open rajrana22 opened 1 year ago

rajrana22 commented 1 year ago

I'm trying to go through the call stack of ec_encode_data using printf statements as the function is being called in erasure_code_perf.c.

I believe that the only two places where ec_encode_data is defined are in ec_base_aliases.c and ec_base_vsx.c. I wrote printfstatements under both definitions to see where the entry point of the function is, but when I ran make perfs, neither printf statements were outputted. Am I doing something wrong or thinking about the function incorrectly?

What I'm ultimately trying to do is find some sort of "slice" size. The concept of slicing was mentioned in the Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage paper, specifically in section 4.2, which states that "current encoding implementation (e.g., Intel ISA-L [4] and QFS [49]) often splits data chunks of large size (e.g., 64 MiB) into smaller-size data slices and performs slice-based encoding with hardware acceleration (e.g., Intel ISA-L) or parallelism (e.g., QFS)". I'm trying to find where in the ISA-L code this slicing occurs, and more specifically, what the slice size is.

Any help would be greatly appreciated!

gbtucker commented 1 year ago

Hi @rajrana22, Some of the files you listed like ec_base_aliases.c and ec_base_vxs.c would only run on a system with no arch-specific optimizations or no modern instruction sets. This is likely why your print statements are not called. Other versions do split up operations for balancing calculation and what temp can be kept in registers. For example we may opt to load sources and calculate 6 parity at a time before looping through sources for the next 6 parity calculations. Within the inner loop, loads from each source really only "slice" based on the size of vector registers. In the file gf_6vect_dot_prod_avx512.asm you can see we load 64 bytes at a time. Other slicing or blocking is really up to the user as they can send in chunks as they see benefit. The term slice is usually used in RAID or EC for how a single source is split into sources and this is done before passing to ISA-L EC functions.

rajrana22 commented 1 year ago

Hi @gbtucker, Thank you for your reply! That was very helpful for me.

Unfortunately, I am not very experienced with assembly, so I was wondering if you could help answer some other questions that I have, which are particular to the erasure coding assembly files:

For reference, I am looking specifically at the gf_2vect_dot_prod_avx512.asm file.

  1. What is the control flow of func(gf_2vect_dot_prod_avx512)? What instruction does it execute after mov dest1, [dest1]? How does it get into the .next_vect loop?
  2. What exactly does the bulk of the .next_vect loop do? Specifically, I'm confused about what all of the masks and nibbles are and what they do.
  3. Upon calling the assembly file, apart from the function arguments, what values are loaded into registers? I'm a bit unsure about what values ptr, vec_i, dest2, and pos have, and specifically what they are for.
  4. Overall, I think a high-level understanding of how the assembly files work will help me out a lot, because I'm trying to make modifications to them for my purposes.
gbtucker commented 1 year ago
  1. What is the control flow of func(gf_2vect_dot_prod_avx512)? What instruction does it execute after mov dest1, [dest1]? How does it get into the .next_vect loop?

The vpxorq xp1, xp1, xp1 instruction just zeroes out the first accumulator. After that it falls through from the outer loop to the inner loop.

  1. What exactly does the bulk of the .next_vect loop do? Specifically, I'm confused about what all of the masks and nibbles are and what they do.

The flow is simply two loops. The inner loop .next_vect goes through each coefficient and source to multiply and accumulate. The outer loop writes out the parity and resets for next inner loop.

  1. Upon calling the assembly file, apart from the function arguments, what values are loaded into registers? I'm a bit unsure about what values ptr, vec_i, dest2, and pos have, and specifically what they are for.

Only function arguments are passed to these functions. The ptr, vec_i, dest2, and pos are temporary variables to help index the proper offsets into the arrays passed to the functions.

  1. Overall, I think a high-level understanding of how the assembly files work will help me out a lot, because I'm trying to make modifications to them for my purposes.

I hope this helps.