Closed vmiheer closed 4 years ago
You're not missing anything - the current autoscheduler does not generate GPU schedules. We're working on it, but it won't be soon.
On Mon, Apr 2, 2018 at 7:50 PM, miheer vaidya notifications@github.com wrote:
I was wondering can I do auto scheduling for GPUs? I have simple halide code:
for (int y = 0; y < size; y++) { for (int x = 0; x < size; x++) { A(x, y) = rand() & 0xfff; } } Var j("j"), i("i"), k("k"); Func out4("out4"); out4(i, j) = A(i, j) * 2;
Target target = get_target_from_environment(); target.set_feature(Halide::Target::CUDACapability35); target.set_feature(Halide::Target::CUDA);
Pipeline p(out4); out4.estimate(i, 0, size).estimate(j, 0, size); cout << p.auto_schedule(target);
I get schedule:
// Target: x86-64-linux-avx-avx2-cuda-cuda_capability_35-f16c-fma-sse41// MachineParams: 16,16777216,40 Var i_vi("i_vi"); Var i_vo("i_vo"); Func out4 = pipeline.get_func(0); { Var i = out4.args()[0]; Var j = out4.args()[1]; out4 .compute_root() .split(i, i_vo, i_vi, 8) .vectorize(i_vi) .parallel(j); }
Now even though the Target has feature: cuda, the code is not running on GPU. Am I missing something?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/halide/Halide/issues/2852, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfdRuwguedide5AE7YjuDPg9Y7p7jKdks5tkoD_gaJpZM4TD_fI .
Is this now supported given the August ACM paper?
Apologies for the delinquent replies. The GPU autoscheduler from the differentiable Halide paper [Li et al. 2018] is a much simpler heuristic. It works well enough to be useful for the more constrained space you get from programs generated by reverse-mode automatic differentiation (which was its target); it's not powerful enough to be more than a simple baseline for a wider class of programs than those. It was also not production-level code at the time of publication.
Some good news since, though:
Our to-appear SIGGRAPH paper, on an all new, much more powerful learned autoscheduler [Adams et al. 2019], includes preliminary GPU support. That code is now in master here: https://github.com/halide/Halide/tree/master/apps/autoscheduler, and GPU improvements will be landing there hopefully rapidly over the coming months.
Using the plug-in autoscheduler interface developed for that, @BachiLi has ported and improved his simple GPU autoscheduler (the first one mentioned above). That should be landing in apps/ soon, as well. Again, it won't give state-of-the-art performance for more complex cases, but it should give decent first-order results very easily (and be quite useful for automatically differentiated programs, which are now also supported in master).
Seems like autoscheduler app is not built using cmake only makefile can build it.
CMake support is spotty at best for various apps. It's a longstanding issue; most core developers use only Make, and as a result CMake support gets overlooked. IMHO we should really prioritize properly supporting all our build systems; though I personally have strong reservations about CMake in general, I do suspect we'd be better off in the long run by biting the bullet and standardizing on it as our only build system (as LLVM did a few years ago).
@jrk, about
Our to-appear SIGGRAPH paper, on an all new, much more powerful learned autoscheduler [Adams et al. 2019], includes preliminary GPU support. That code is now in master here: https://github.com/halide/Halide/tree/master/apps/autoscheduler
- Is there arxiv version of paper available (or preprint available somewhere)?
- I am building the app as
make OPTIMIZE='-O0 -g' HL_TARGET='host-cuda' test
and some changes in makefile/test.cpp to set target ashost-cuda
. With those changes I seetarget
variable togenerate_schedule
passed in as"x86-64-linux-avx-cuda-cuda_capability_35-sse41"
. But I don't think it is generating GPU schedule.Func h = get_pipeline().get_func(1); Func f = get_pipeline().get_func(0); Var x(h.get_schedule().dims()[0].var); Var xi("xi"); Var y(h.get_schedule().dims()[1].var); Var yi("yi"); h .split(y, y, yi, 64, TailStrategy::ShiftInwards) .split(x, x, xi, 4, TailStrategy::ShiftInwards) .vectorize(xi) .compute_root() .reorder(xi, x, yi, y) .parallel(y); f .store_in(MemoryType::Stack) .split(x, x, xi, 4, TailStrategy::RoundUp) .unroll(x) .unroll(y) .vectorize(xi) .compute_at(h, x) .reorder(xi, x, y);
- Are the steps I am trying okay to use the primitive gpu scheduler?
- Maybe I need to send correct MachineParams?
The gpu support hasn't landed in master yet. Development is happening in standalone_autoscheduler_gpu, but I wouldn't try to use it yet (e.g. there are no good network weights right now).
Also @vmiheer the paper is here: https://halide-lang.org/papers/autoscheduler2019.html
@steven-johnson
CMake support is spotty at best for various apps. It's a longstanding issue; most core developers use only Make, and as a result CMake support gets overlooked. IMHO we should really prioritize properly supporting all our build systems; though I personally have strong reservations about CMake in general, I do suspect we'd be better off in the long run by biting the bullet and standardizing on it as our only build system (as LLVM did a few years ago).
In my humble opinion, CMake is way much better and convenient than Make.
Fighting words! Steven and I have had conversations in person about how large a nail would you have to drive through your hand in order for it to be as painful as using cmake. More seriously, it totally depends on what you're doing. Standard C++ binaries or libraries are cleaner in cmake than make, but once you start doing unusual things (multi-phase compilation with generated intermediates, weird linker invocations, etc), make presents fewer barriers to getting work done.
And here is where Andrew and I disagree: while we both dislike* CMake, I suspect at this point that we'd be better off settling on one build system for everyone to use, even if that means holding our nose and dealing with CMake's eccentricities.
@abadams I see what you mean. May I paraphrase what you say as: CMake is easier for a superficial user of a library and Make is easier for a hardcore developer of a library.
We need to refactor the autoschedulers into separate modules and patch the CMake build to distribute them. This is a TODO in #4644
The CMake build now distributes both autoschedulers. Tests forthcoming...
both autoschedulers
This is great news, but technically, there are three autoschedulers. (Presumably we'll look into splitting the 'built-in' into a separate package -- like the others -- once everything else lands.)
Any docs/examples about using them?
@alsrgv - when #4644 lands, the documentation will be in README_cmake.md
why was this issue closed, was the auto scheduling for gpu implemented?
why was this issue closed, was the auto scheduling for gpu implemented?
Li2018 can produce GPU schedules. Improving those schedules is a different issue.
great, thanks for the info!
I was wondering can I do auto scheduling for GPUs? I have simple halide code:
I get schedule:
Now even though the Target has feature: cuda, the code is not running on GPU. Am I missing something?