halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

RDom with Param<int> vs ImageParam #8478

Closed soufianekhiat closed 2 days ago

soufianekhiat commented 6 days ago

Conceptually the same code: Working:

Param<int> radius_p( "radius" );
RDom wide( -radius_p, 2 * radius_p + 1 );

Not working:

ImageParam radius_p( Int( 32 ), 0, "radius" );
Expr radius_v = radius_p();
RDom wide( -radius_v, 2 * radius_v + 1 );

With this error:

Error: The bounds of the RDom r12 in dimension 0 are:
  (0 - radius_im()) ... ((2*radius_im()) + 1)
These depend on a call to the Func radius_im.
The bounds of an RDom may not depend on a call to a Func.
alexreinking commented 2 days ago

This is by design. RDoms are assumed to have static bounds throughout the pipeline. Params are immutable, but ImageParams are not (inputs may alias with the outputs).

abadams commented 2 days ago

Inputs aren't supposed to alias with outputs, but unfortunately people do that in practice. There are some scheduling directives that will corrupt the output if you do that, so you have to take care.

A load to an image param where the coordinates are all start-up expressions (*) could be considered a start-up expression, and it is in some places in the compiler (e.g. Bounds.cpp:1200), but we're not consistent about it.

(*) A start-up expression is one that depends only on scalar inputs to the pipeline, and can be evaluated at any point in the pipeline and will always give the same value when it is.

soufianekhiat commented 2 days ago

Thanks. [Close as Design]

abadams commented 2 days ago

Actually I think the showstopper is that ImageParams may or may not be accessible because they may e.g. only be valid on host or on device. This makes them difficult to use for start-up expressions. Code in the compiler that currently allows it is breakable with some effort. The code below fills most of the output with garbage, because it uses a stale host-side value of an imageparam input during bounds inference:


using namespace Halide;

int main(int argc, char **argv) {

    ImageParam im(Int(32), 0);

    Func f("f"), g("g"), h("h");
    Var x;

    f = im.in();

    f.compute_root().gpu_single_thread();

    // FuncValueBounds of f will include the call to im, because there are no
    // vars in it. But f itself will only ever be computed on device, which only
    // requires im to be available on device.

    h(x) = x;
    h.compute_root();
    g(x) = h(x % f());
    // The bounds required of h depend on the func value bounds of f. These
    // bounds will be evaluated on the host, which needs to access im on the
    // host at the pipeline entry.

    // Make a buffer that's dirty on device
    auto buf = Buffer<int>::make_scalar();
    buf() = 3;
    Func make_big;
    make_big() = 256;
    make_big.gpu_single_thread();
    auto callable = make_big.compile_to_callable({});
    callable(buf);

    assert(buf.device_dirty());

    im.set(buf);

    // The call to g will access h at a coordinate up to 255, but h will only be
    // computed to be size 3, because the func value bounds of f include a
    // host-side access which ignores device_dirty.

    h.trace_realizations().trace_loads();
    g.trace_stores();
    g.realize({256});

    return 0;
}