Closed soufianekhiat closed 2 days ago
This is by design. RDoms are assumed to have static bounds throughout the pipeline. Params are immutable, but ImageParams are not (inputs may alias with the outputs).
Inputs aren't supposed to alias with outputs, but unfortunately people do that in practice. There are some scheduling directives that will corrupt the output if you do that, so you have to take care.
A load to an image param where the coordinates are all start-up expressions (*) could be considered a start-up expression, and it is in some places in the compiler (e.g. Bounds.cpp:1200), but we're not consistent about it.
(*) A start-up expression is one that depends only on scalar inputs to the pipeline, and can be evaluated at any point in the pipeline and will always give the same value when it is.
Thanks. [Close as Design]
Actually I think the showstopper is that ImageParams may or may not be accessible because they may e.g. only be valid on host or on device. This makes them difficult to use for start-up expressions. Code in the compiler that currently allows it is breakable with some effort. The code below fills most of the output with garbage, because it uses a stale host-side value of an imageparam input during bounds inference:
using namespace Halide;
int main(int argc, char **argv) {
ImageParam im(Int(32), 0);
Func f("f"), g("g"), h("h");
Var x;
f = im.in();
f.compute_root().gpu_single_thread();
// FuncValueBounds of f will include the call to im, because there are no
// vars in it. But f itself will only ever be computed on device, which only
// requires im to be available on device.
h(x) = x;
h.compute_root();
g(x) = h(x % f());
// The bounds required of h depend on the func value bounds of f. These
// bounds will be evaluated on the host, which needs to access im on the
// host at the pipeline entry.
// Make a buffer that's dirty on device
auto buf = Buffer<int>::make_scalar();
buf() = 3;
Func make_big;
make_big() = 256;
make_big.gpu_single_thread();
auto callable = make_big.compile_to_callable({});
callable(buf);
assert(buf.device_dirty());
im.set(buf);
// The call to g will access h at a coordinate up to 255, but h will only be
// computed to be size 3, because the func value bounds of f include a
// host-side access which ignores device_dirty.
h.trace_realizations().trace_loads();
g.trace_stores();
g.realize({256});
return 0;
}
Conceptually the same code: Working:
Not working:
With this error: