Open FuncJ opened 2 years ago
Hello,
Thank you for your questions and interest in CAKE.
For 1: CAKE performs better for the smaller and skewed matrices because they have a lower arithmetic intensity (operations per memory access). CAKE increases the arithmetic intensity of small matrices through block shaping. When the matrices are larger and less skewed the arithmetic intensity before block shaping is higher.
For 2: At the time of writing AMD uprof did not provide a counter to measure the number of DRAM accesses so we had to estimate using "number of L1 cache-line refills from DRAM". The bandwidth should be constant but the estimation method causes it to not be constant.
For 3: The bandwidth is constant because of the constant bandwidth block increasing the amount of computations associated with each core.
Thanks. Why don't you experiment with a server with a larger number of cores?
We plan on doing that. At the time of writing we didn't have a server with a large number of cores. Stay tuned for future work :)
When I tested cake_sgemm
on a 64-core ARM server, function get_cache_size()
had an error, lscpu: unrecognized option --cache=NAME,ALL_SIZE
. How can I modify the code to specify the cache size?Does the following code is right?
Yup that looks right. On some linux installations, the lscpu command has limited options so it may be easier to just hardcode the cache sizes like you have done
Hi, I think your article is very innovative and interesting. I have a few questions about the experiment. 1 Why is CAKE better than MKL at the scale of the matrix shown in the shaded area? Especially in the bottom left corner of each graph, Cake performes twice as well as MKL. Does that mean MKL's performance is less than 50% of the peak performance of the system? 2 AMD Ryzen 9 5950X is a platform with sufficient cache bandwidth. Why is CAKE's memory bandwidth requirement not constant as the number of cores increases? 3 ARM v8 Cortex A53 is a platform without sufficient cache bandwidth. Why is CAKE's memory bandwidth requirement constant as the number of cores increases? Thanks a lot.