halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.9k stars 1.07k forks source link

How to do in-place with Tuple ? #4379

Open SuTanTank opened 5 years ago

SuTanTank commented 5 years ago

I'm trying to implement an algorithm that use a "splatting" pattern, where given a input image input(x, y), a few output pixels output(outx(x, y) + r.x, outy(x, y) + r.y) is going to be modified accordingly, and r is a RDom representing a local window, lets say, 5x5.

I managed to get it working by a update function, with a 4d RDom(0, width, 0, height, -2, 5, -2, 5) but it runs slowly and I have no idea how to schedule it properly. In C++, I could just process every input pixel with some obvious parallelism and it's intuitive. But in Halide, it seems I have to schedule it on output domain?

Could you give some suggestions for reference to any code examples?

abadams commented 5 years ago

To parallelize a scatter like that, you probably want to either rfactor it (see tutorial lesson 18), or use the atomic() scheduling directive, which lets you parallelize rvars for things like addition even if there's a race by using atomic ops.

SuTanTank commented 5 years ago

Thank you, I checked out the lesson 18 and it's very helpful. However I have another question:

My scatter is to modify the output pixel according to input pixel, thus it's better to use an in-place pipeline. But I tried use undef<T>() as pure definition of the output function and some update rules, such as output(x, y) += 1. But it seems the output is always set to 0 by the pure definition and the result is always 1 no matter what the original value is.

So my question is, is there a proper way to implement an in-place pipeline?

Update: This happens when the output is a tuple.

// example
Func tuple;
Func output_1, output_2
Var x, y;
output_1(x, y) = undef<float>();
output_2(x, y) = undef<float>();
r(0,100, 0, 100);
output1(r.x, r.y) += 0.1f;
output2(r.x, r.y) += 0.1f;
tuple(x, y) = Tuple(output_1(x, y), output_2(x, y));
// result output_1 and output_2 are all 0.1f;
SuTanTank commented 5 years ago

After some testing, here is an example that reproduce an unexpected result, which is not because of the use of Tuple, but a extra output wrapper. So maybe Tuple can't be used with undef<T>() ?

auto width = 10;
auto height = 10;

Var x, y;
Halide::Func foo("foo");
foo(x, y) = Halide::undef<float>();
Halide::RDom r(0, width, 0, height);
foo(r.x, r.y) += 0.1f;
Halide::Func output;
output(x, y) = foo(x, y);

Halide::Buffer<float> ones = Halide::lambda(x, y, 1.f).realize(width, height);
output.realize(ones);

And the result is like this:

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 ... 2.0 2.1 2.2 2.3 ... 3.0 ... 9.1 9.2 9.3 ... 10

rather than this:

1.1 1.1 1.1 ... 1.1 1.1 1.1 ... 1.1 ... ... 1.1 ... 1.1

Use the Tuple will get similar result.