halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.86k stars 1.07k forks source link

performance regression on camera_pipe on arm #147

Closed abadams closed 11 years ago

abadams commented 11 years ago

It's now 2x slower than assembly. Looks like deinterleaving is failing to trigger. This needs some love.

abadams commented 11 years ago

@jrk

Performance is now back where it should be:

Halide: 63349 C++: 333401 ASM: 79746

But that might be a bogus comparison, because the FCam ASM code produces incorrect output (I haven't changed this code).

jrk commented 11 years ago

Great. (And: blarg.) I'll see if I broke the ASM test at some point. My memory is fuzzy, but I thought I checked its output when I integrated it into camera_pipe/process.cpp last fall.

What changed in the Halide version? Is this just from halide/Halide@415769e9?

abadams commented 11 years ago

That, plus some more aggressive let substitution during simplification.

Simplification of lets has always been a balance between presenting peephole optimization opportunities to later stages, and not combinatorially expanding code. If peephole optimizations could sniff through lets, or if all mutator and visitor stages handled dags without expanding them into trees, this tension wouldn't exist. But both of those things are hard, so so far I've just been pushing it one way or the other to balance compile times and running time.

On Fri, Aug 30, 2013 at 11:52 AM, Jonathan Ragan-Kelley < notifications@github.com> wrote:

Great. (And: blarg.) I'll see if I broke the ASM test at some point. My memory is fuzzy, but I thought I checked its output when I integrated it into camera_pipe/process.cpp last fall.

What changed in the Halide version? Is this just from halide/Halide@ 415769e https://github.com/halide/Halide/commit/415769e9?

— Reply to this email directly or view it on GitHubhttps://github.com/halide/Halide/issues/147#issuecomment-23581995 .