vnatesh / CAKE_on_CPU

CAKE Library for constant-bandwidth matrix multiplication on CPUs
14 stars 4 forks source link

Question about Evaluation #3

Open FuncJ opened 2 years ago

FuncJ commented 2 years ago

Hi, I think your article is very innovative and interesting. I have a few questions about the experiment. 1 Why is CAKE better than MKL at the scale of the matrix shown in the shaded area? Especially in the bottom left corner of each graph, Cake performes twice as well as MKL. Does that mean MKL's performance is less than 50% of the peak performance of the system? image 2 AMD Ryzen 9 5950X is a platform with sufficient cache bandwidth. Why is CAKE's memory bandwidth requirement not constant as the number of cores increases? image 3 ARM v8 Cortex A53 is a platform without sufficient cache bandwidth. Why is CAKE's memory bandwidth requirement constant as the number of cores increases? image Thanks a lot.

asabot commented 2 years ago

Hello,

Thank you for your questions and interest in CAKE.

For 1: CAKE performs better for the smaller and skewed matrices because they have a lower arithmetic intensity (operations per memory access). CAKE increases the arithmetic intensity of small matrices through block shaping. When the matrices are larger and less skewed the arithmetic intensity before block shaping is higher.

For 2: At the time of writing AMD uprof did not provide a counter to measure the number of DRAM accesses so we had to estimate using "number of L1 cache-line refills from DRAM". The bandwidth should be constant but the estimation method causes it to not be constant.

For 3: The bandwidth is constant because of the constant bandwidth block increasing the amount of computations associated with each core.

FuncJ commented 2 years ago

Thanks. Why don't you experiment with a server with a larger number of cores?

asabot commented 2 years ago

We plan on doing that. At the time of writing we didn't have a server with a large number of cores. Stay tuned for future work :)

FuncJ commented 2 years ago

When I tested cake_sgemm on a 64-core ARM server, function get_cache_size() had an error, lscpu: unrecognized option --cache=NAME,ALL_SIZE. How can I modify the code to specify the cache size?Does the following code is right? image

vnatesh commented 2 years ago

Yup that looks right. On some linux installations, the lscpu command has limited options so it may be easier to just hardcode the cache sizes like you have done