halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.83k stars 1.07k forks source link

global shared variable #2888

Open shockjiang opened 6 years ago

shockjiang commented 6 years ago

I'm trying to define a global shared variable, which counts a non-zero elements in a input, like this:

counter = 0 N += select(input[x,y] > 10, 1, 0)

however, this is very hard in halide, is there any global shared variable that targets this goal?

MDBrothers commented 6 years ago

Couldn't you at least count along rows in parallel and then do a serial reduction on the resulting 1D vector?


Halide::RDom strip(0, 16);

Halide::Func binarized("binarized");
binarized(x,y) = Halide::select( input(x, y) > 10, 1, 0);
Halide::Func binarizedWithBC = Halide::boundaryconditions::constant_exterior(binarized, 0, 0, imageWidth, 0, imageHeight);

Halide::Func count1, count2, count3.;
count1(x, y) = Halide::sum(binarizedWithBC (strip.x + 16*x, y));
count1.parallel(y);

count2(x, y) = Halide::sum(count1(strip.x + 16*x, y));
count2.parallel(y);

count3(x, y) = Halide::sum(count2(strip.x + 16*x, y));
count3.parallel(y);

Halide::RDom finalR(0, imageWidth / 16/ 16/ 16+ 1, 0, imageHeight);

Halide::Expr finalCount = Halide::sum( count3(finalR.x, finalR.y));

Crude probably but fast enough.

ashishUthama commented 6 years ago

The parallel histogram section in the Scheduling FAQ shows a way similar to what @MDBrothers mentioned - you could adapt that too (Since you essentially need 'one bucket')).

MDBrothers commented 6 years ago

I agree with @ashishUthama . The halide scheduling example as well as the associative reduction tutorial http://halide-lang.org/tutorials/tutorial_lesson_18_parallel_associative_reductions.html I think are related to your problem. I'm always pleasantly surprised when almost anything reasonable works, scheduling wise.