Previously, synthesized VM CPU/NUMA/caches topology had a bug: each CPU was pointing to the same L1d cache, same L1i cache, same L2 cache. This was interpreted by the LibOS layer as e.g. a single L1d cache on the platform shared by all CPUs. This bogus CPUs-caches topology confused some programs, in particular the GEMM Rust crate: the crate calculates the number of CPUs sharing a particular cache, then uses this number to calculate the "effective" number of bytes in the cache reserved for a single CPU, and then uses this number to optimize matrix multiplication:
This PR creates a correct CPUs-caches topology: each CPU has a dedicated L1d cache, L1i cache and L2 cache. L3 cache is shared by all CPUs (same as it was done previously). This satisfies the GEMM Rust crate and allows to run e.g. Candle ML framework.
Description of the changes
Previously, synthesized VM CPU/NUMA/caches topology had a bug: each CPU was pointing to the same L1d cache, same L1i cache, same L2 cache. This was interpreted by the LibOS layer as e.g. a single L1d cache on the platform shared by all CPUs. This bogus CPUs-caches topology confused some programs, in particular the GEMM Rust crate: the crate calculates the number of CPUs sharing a particular cache, then uses this number to calculate the "effective" number of bytes in the cache reserved for a single CPU, and then uses this number to optimize matrix multiplication:
This PR creates a correct CPUs-caches topology: each CPU has a dedicated L1d cache, L1i cache and L2 cache. L3 cache is shared by all CPUs (same as it was done previously). This satisfies the GEMM Rust crate and allows to run e.g. Candle ML framework.
How to test this PR?
Run #31.
This change is