halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.89k stars 1.07k forks source link

Allow storage outside parallel loop with allow_race_conditions #3742

Open ongjunjie opened 5 years ago

ongjunjie commented 5 years ago

Use case for this is when running on GPU, some intermediate stages need to store in global memory as available shared memory is too little. Today, this behaviour can be only be achieved by scheduling these stages to run in its own kernel rather than fusing with its consumer.

Toy example:

#include <Halide.h>

int main() {
  Halide::Func f, g;
  Halide::Var x;

  f(x) = x;
  g(x) = f(x) + 1;

  g.parallel(x);
  f.store_root().allow_race_conditions().compute_at(g, x);
  g.compile_jit();

  return 0;
}

This gives the error

Error: Func "f0" is stored outside the parallel loop over f1.v0 but computed within it. This is a potential race condition.

abadams commented 5 years ago

There are a few options for supporting this:

Thoughts?

ongjunjie commented 5 years ago