sarah-quinones / gemm

MIT License
76 stars 11 forks source link

Don't crash on Linux machines with L4 cache #28

Open dfyz opened 4 months ago

dfyz commented 4 months ago

I recently found out that this is a thing when trying to run a candle program (which depends on gemm) on this machine:

# grep 'model name' /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz
...
# cat /sys/devices/system/cpu/cpu*/cache/index4/level
4
...
# lscpu -B -C=type,level,ways,coherency-size,one-size
TYPE        LEVEL WAYS COHERENCY-SIZE  ONE-SIZE
Data            1    8             64     32768
Instruction     1    8             64     32768
Unified         2    8             64    262144
Unified         3   12             64   6291456
Unified         4   16             64 134217728

The Linux-specific code path that probes cache sizes via lscpu and sysfs assumes that level can't be greater than 3, so without this PR anything using gemm crashes like this:

index out of bounds: the len is 3 but the index is 3

This PR fixes this by adding a guard identical to the one existing in the generic X86 cache size probing code.

(an interesting theoretical question is whether it is possible to somehow exploit this gigantic 128 MiB cache instead of ignoring it)

sarah-quinones commented 4 months ago

an alternative approach that would make use of the cache is doing something like let level = Ord::min(level, 3)

dfyz commented 4 months ago

an alternative approach that would make use of the cache is doing something like let level = Ord::min(level, 3)

I just tried that, but it appears to be trickier than I thought at first:

Perhaps it makes sense to merge the fix for the crashes first, and then think of exploiting the L4 cache. By the way, I also added an additional commit that prevents the lscpu code path from crashing (my bad, I completely forgot about it).