Open goldsteinn opened 3 years ago
I wonder what's the best way to detect Skylake and newer CPUs at compile time. GCC seems to set some useful defines
$ gcc -march=native -dM -E - < /dev/null | grep -i skylake
#define __skylake_avx512__ 1
#define __skylake_avx512 1
#define __tune_skylake_avx512__ 1
But clang doesn't do that. Both compilers set defines for Skylake features like __AVX512VL__
though.
I think that will only cover skylake server. Aka:
$ gcc -march=skylake -dM -E - < /dev/null | grep -i skylake
#define __skylake__ 1
#define __tune_skylake__ 1
#define __skylake 1
vs.
$ gcc -march=skylake-avx512 -dM -E - < /dev/null | grep -i skylake
#define __skylake_avx512__ 1
#define __skylake_avx512 1
#define __tune_skylake_avx512__ 1
But you could probably hack it together with something along the lines of:
#if (__skylake__ || __AVX512F__) && !__knl__ // Can skip the !__knl__ if Knights Landing also optimizes out prefetch NULL
#define USE_PREFETCH_NULL 1
#endif
AFAIK Any non Knight's Landing micro-arch with AVX512 is Skylake or newer so __AVX512F__
+ __skylake__
should work. There may be something I'm missing though.
That said rather than a GCC hack. Probably best bet is to use CPUID.
See: https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linked_list.h#L48
From Intel Manual:
Did a quick test on my machine not seeing any
DTLB_LOAD_MISSES
on prefetchNULL
(or any address less than 4096 for that matter).