halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.78k stars 1.07k forks source link

We should have helper functions that return bool Exprs that query properties of the target #8192

Open abadams opened 2 months ago

abadams commented 2 months ago

Portable code is best, but sometimes when you have a high-level mathematical operation you want to perform, there are multiple ways to express it, and different ways are more or less suited to the instructions available on a given target. E.g. for integer division within some certain range you may want to use bit-tricks, or you may want to do it as float, depending on the floating point throughput of your target.

To support this, currently you have to plumb the damn target though every mathematical helper function, so that you can switch on it. This leads to ugly code. It would be cool if you could instead say something like:

Expr e = select(target_arch_is(Target::ARM), something, something_else)

I propose one new helper functions in IROperator.h for each enum in Target. They would each return a boolean Expr:

Expr target_arch_is(Target::Arch);
Expr target_os_is(Target::OS);
Expr target_processor_is(Target::Processor)
Expr target_has_feature(Target::Feature);

These Exprs would be calls to similarly-named intrinsics in the IR, with the enum arg converted to a constant integer. These intrinsics would be lowered to either true or false at the earliest possible opportunity (probably alongside strictify_float at the top of lowering).

soufianekhiat commented 2 months ago

Nice to have could be, as a syntaxic sugar:

Expr device_api_is(DeviceAPI api);
zvookin commented 2 months ago

The main use case is library implementation, though I'm tempted to say using a context argument for such things may be preferable as the configuration can be more powerful than just the target. (E.g. we need to revisit the way we specify implementations of architectures in Target as it is clearly not being elaborated to cover any of the space it needs to cover.)

It is important to clarify that this is compile time only. Passing the current target through C++, as generators do, allows controlling the construction of IR. This proposed feature allows constructing the IR with multiple paths, one of which is chosen during lowering. A further option would allow selecting based on the actual hardware at runtime, which is something that would need other changes and currently doesn't make a lot of sense, but this mechanism could be easily confused for that if one isn't already well versed in Halide's compilation model.

steven-johnson commented 2 months ago

Maybe also add Expr target_natural_vector_width(Type)?

abadams commented 2 months ago

Yeah I was wondering about vector width too. Normally anywhere you'd write a schedule you have a Target, but it could be handy if somewhere deep in a math library with no access to a Target you need to make a compute_root memoized Func as a lut.