halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

Some ImageParam access can occur in bounds inference on host, but this ignores device_dirty #8481

Open abadams opened 2 days ago

abadams commented 2 days ago

Bounds inference lets ImageParam accesses that have all-constant args into bounds inference expressions evaluated at pipeline entry. Because all ImageParams are accessed through trivial wrapper Funcs, you can only make this happen by injecting a dependency on the FuncValueBounds of the wrapper Func itself.

This can create a situation where the host-side value doesn't match the device-side value, so the code that allocates or produces a Func doesn't match the code that accesses it, causing crashes or garbage output.

This is extremely contrived and unlikely to ever be triggered by real code, but there may be other bugs lurking due to places where ImageParam call nodes leak into start-up expressions.

Example from #8478 that produces garbage output:

using namespace Halide;

int main(int argc, char **argv) {

    ImageParam im(Int(32), 0);

    Func f("f"), g("g"), h("h");
    Var x;

    f = im.in();

    f.compute_root().gpu_single_thread();

    // FuncValueBounds of f will include the call to im, because there are no
    // vars in it. But f itself will only ever be computed on device, which only
    // requires im to be available on device.

    h(x) = x;
    h.compute_root();
    g(x) = h(x % f());
    // The bounds required of h depend on the func value bounds of f. These
    // bounds will be evaluated on the host, which needs to access im on the
    // host at the pipeline entry.

    // Make a buffer that's dirty on device
    auto buf = Buffer<int>::make_scalar();
    buf() = 3;
    Func make_big;
    make_big() = 256;
    make_big.gpu_single_thread();
    auto callable = make_big.compile_to_callable({});
    callable(buf);

    assert(buf.device_dirty());

    im.set(buf);

    // The call to g will access h at a coordinate up to 255, but h will only be
    // computed to be size 3, because the func value bounds of f include a
    // host-side access which ignores device_dirty.

    h.trace_realizations().trace_loads();
    g.trace_stores();
    g.realize({256});

    return 0;
}