Open jansel opened 1 month ago
So it takes 32 minutes... but still successfully compiles? Interesting. Maybe there's a lurking pass with exponential complexity for this example.
Yeah, it finishes and runs correctly.
Looks like it's not compilation proper, but rather the anderson autoscheduler getting stuck enumerating a combinatorial number of tiling options, which is a bit absurd given that this entire pipeline seems to be elementwise other than accesses to the input buffer.
A workaround would be to ask the autoscheduler to do a lot less by generating an Expr instead of a Func for anything that has no update definition and is either consumed elementwise or is an op that is cheaper than a load (e.g. tmp48).
This example takes 32 minutes to compile, while typical kernels take seconds (not minutes). I suspect it is hitting some sort of pathological case in Halide.
repro.py
cc @alexreinking this example coming from:
on https://github.com/pytorch/pytorch/pull/136809