jrk / gradient-halide

Other
102 stars 17 forks source link

How is this auto-scheduler different from the one in the Halide master branch? #6

Open Meatle opened 5 years ago

Meatle commented 5 years ago

Hi,

I want to test some operators with auto-scheduler. So currently, we have an auto-scheduler in halide master branch and a simple auto-scheduler here. I know this simple auto-scheduler also supports GPU. But for CPU, how is it different from the one in the master branch?

For example, could you list some cases under which this auto-scheduler will be better or worse?

Thanks!

abadams commented 5 years ago

It's totally different. This autoscheduler is a collection of simple heuristics to get decent performance on autodiff pipelines. It inlines things that should be inlined, maps stages to GPU sensibly, and factors large reductions to expose parallelism. It turns out that doing this is enough to be way way faster than the big machine learning frameworks on imaging-style pipelines.

The autoscheduler in Halide master searches over the space of fusion-in-tiles and is much more production tested. However it doesn't map to the GPU, and it doesn't know how to factor large reductions, and both of these things are critical for the reverse-mode autodiff pipelines in the Halide gradients paper. So on CPU, unless performance is dominated by a single large reduction, I would expect the one in Halide master to be faster than this one nearly all of the time.

abadams commented 5 years ago

Also, here's a fork of Halide from the time the 2016 autoscheduling paper was published. It includes the GPU autoscheduler mentioned in that paper: https://github.com/ravi-teja-mullapudi/Halide

Meatle commented 5 years ago

Thanks a lot!