Closed abadams closed 11 years ago
@jrk
Performance is now back where it should be:
Halide: 63349 C++: 333401 ASM: 79746
But that might be a bogus comparison, because the FCam ASM code produces incorrect output (I haven't changed this code).
Great. (And: blarg.) I'll see if I broke the ASM test at some point. My memory is fuzzy, but I thought I checked its output when I integrated it into camera_pipe/process.cpp
last fall.
What changed in the Halide version? Is this just from halide/Halide@415769e9?
That, plus some more aggressive let substitution during simplification.
Simplification of lets has always been a balance between presenting peephole optimization opportunities to later stages, and not combinatorially expanding code. If peephole optimizations could sniff through lets, or if all mutator and visitor stages handled dags without expanding them into trees, this tension wouldn't exist. But both of those things are hard, so so far I've just been pushing it one way or the other to balance compile times and running time.
On Fri, Aug 30, 2013 at 11:52 AM, Jonathan Ragan-Kelley < notifications@github.com> wrote:
Great. (And: blarg.) I'll see if I broke the ASM test at some point. My memory is fuzzy, but I thought I checked its output when I integrated it into camera_pipe/process.cpp last fall.
What changed in the Halide version? Is this just from halide/Halide@ 415769e https://github.com/halide/Halide/commit/415769e9?
— Reply to this email directly or view it on GitHubhttps://github.com/halide/Halide/issues/147#issuecomment-23581995 .
It's now 2x slower than assembly. Looks like deinterleaving is failing to trigger. This needs some love.