google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.95k stars 305 forks source link

Enable Multiple Targets For Dynamic Dispatch With MSVC #2220

Closed ChristopherBaboch closed 4 weeks ago

ChristopherBaboch commented 1 month ago

I'm using Visual Studio 2022 with MSVC and trying to get highway to generate all targets for x86-64 in order to use dynamic dispatch. We don't know what capabilities the user's machine will have, but we want at least SSE2 SSE4 AVX2 and AVX3.

The problem is that I can't get all these targets to be generated. It seems like only 2 are getting picked. Here are the macros that I am defining in the .cpp file that calls dynamic dispatching.

#define HWY_BROKEN_TARGETS 0 #define HWY_BASELINE_TARGETS (HWY_SSE4|HWY_AVX2|HWY_SSE2|HWY_AVX3) #define HWY_COMPILE_ALL_ATTAINABLE

And I'm checking the generated targets by looking at the output of hwy::SupportedAndGeneratedTargets(). Using this configuration I only get AVX2, AVX3 and EMU128.

Am I missing something obvious?

jan-wassenberg commented 1 month ago

Hi, your use case makes sense :) It's probably running into this somewhat questionable 'fix' for slow MSVC builds causing our CI to time out:

#if HWY_COMPILER_MSVC
// Fewer targets for faster builds.
#define HWY_ATTAINABLE_TARGETS \
  HWY_ENABLED(HWY_BASELINE_SCALAR | HWY_STATIC_TARGET | HWY_AVX2)

We can add an opt-out to disable this #if, or allow you to override HWY_ATTAINABLE_TARGETS entirely, what do you think?

ChristopherBaboch commented 1 month ago

Hi Jan,

I was wondering about that code. What issues did you run into that made you opt for that? And is there a particular reason to always activate AVX2?

Both suggestions work for me !

Thank you

jan-wassenberg commented 1 month ago

Our MSVC toolchain runs inside an emulator, and it is super slow to compile tests. We do want to enable HWY_AVX2 for some diversity (checking at least two targets and vector lengths). Will soon add the opt-out.