halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.86k stars 1.07k forks source link

Promises regarding aliasing input and output buffers should be improved #6680

Open steven-johnson opened 2 years ago

steven-johnson commented 2 years ago

Currently, you can (theoretically) write a Halide AOT pipeline and schedule it in such a way that it is safe to do "in-place" work on an input (i.e., an output can safely alias an input). In practice:

It would be nice to add some rigor to this. Thinking out loud:

zvookin commented 2 years ago

A few thoughts on what we can do:

1) Adding overlap checking to debug mode so the fact that a call has overlaps is surfaced as a fact when one is analyzing the situation. This is likely a moderately expensive check, especially if one is strict about whether strided buffers have true intersection. (With strides, the start and extent of the buffer can indicate overlap but in truth, no byte is shared between them. My guess is the check does need to account for this.) On the plus side, The check can be done with a series of pairwise check_buffers_overlap rather than requiring significant codegen. It might be best to always generate the argument checking into an inline routine in the header and users can call it or not, and if they don't it incurs no overhead.

2) Add a debug wrapper of some sort that does a copy-in/copy out of all buffers to unique memory locations and runs the kernel on those inputs and outputs as well as on the original arguments, then compare the two for equality. This will positively identify whether there's a bug, though there are likely some edge cases that could produce false positives. (E.g. because unused bytes may have undefined values and it's not clear we can detect all of those cases. The analysis may be as difficult as proving freedom from overlap errors.) This is more of a sanitizer approach. Given that the bug can be a race condition, any given run may or may not surface the issue, though combining it with thread sanitizer may make it 100% reliable.

3) Most uses of overlap between input and output are simple pointwise pipelines of some sort. It may be possible to prove correctness for this class of uses much more easily than anything more general. However I'm not sure this helps as in order to enforce anything we would need to do overlap checks at runtime and the overhead of doing those accurately may be too much. Of course we should measure the overhead...

steven-johnson commented 2 years ago

I like the copy-in-and-out mode idea. Should probably tie it in to ASAN mode so that testing happens "regularly" but in situations where we already expect degraded performance and memory usage.

pranavb-ca commented 2 years ago

Would it be more helpful to indicate which outputs do not alias inputs? Absent such an indication the compiler always assumes aliasing. The downside though is that we'd then have to bake in this assumption in a number of places that presumably, at this point, do no assume aliasing between the store and the load in the following

out[ramp()] = f(in[ramp()]);