halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.78k stars 1.07k forks source link

Detect ARM CPU features for host target and in runtime #8298

Open alexreinking opened 2 weeks ago

alexreinking commented 2 weeks ago

Adds feature detection to the runtime library and to the host target feature computation.

Not sure what the best way is to share code here. Not sure how best to test on Android or Windows/ARM, either.

Fixes #4727 Fixes #6106 Fixes #7901 Fixes #7979

alexreinking commented 2 weeks ago

Not sure what the best way is to detect the ARMv8.1-A feature. It seems certain other features (e.g. sve/dotprod) imply it, or not (armv7s).

alexreinking commented 2 weeks ago

Regarding the tutorial failures, lesson 15 uses the target string host-x86-64 to infer the OS

But with this PR, host will be something like arm-64-osx-arm_dot_prod-arm_fp16. This then becomes x86-64-osx-arm_dot_prod-arm_fp16-sse4 which makes no sense.

The fundamental issue is that "the host but on a different architecture" isn't a well-defined thing.

Brainstorming a few possible resolutions:

  1. Define changing the arch of a target to clear all arch-specific features
  2. Interpret host in the os-position to mean the host os and no more. The target string in the lesson would become x86-64-host.
    1. Bike-shed: use os or hostos in place of host?
  3. Change the lesson to use x86-64-linux instead of host-x86-64.
alexreinking commented 2 weeks ago

Pending further discussion, I'm using this option to continue making progress:

Change the lesson to use x86-64-linux instead of host-x86-64.

steven-johnson commented 1 week ago

Ready to land?

abadams commented 1 week ago

No, the windows ARM code is still just a guess. We're trying to figure out how to test it.

alexreinking commented 6 days ago

I'm trying to test it inside a Windows 11 ARM VM via UTM