haskell-repa / repa

High performance, regular, shape polymorphic parallel arrays.
repa.ouroborus.net
138 stars 36 forks source link

Fix cursored evaluation #9

Closed axman6 closed 7 years ago

axman6 commented 7 years ago

Single character change takes the runtime of repa-canny on a 10226 × 2482 pixel image from

elapsedTimeMS   = 3514
cpuTimeMS       = 3292

to

elapsedTimeMS   = 2997
cpuTimeMS       = 2988

Basically the optimisation of processing 4 cursor positions at a time would never fire because the test x +# 4# >=# x is never false (except if we overflow, and there're bigger problems then).

Pretty nice win for 5 minutes work =)

tmcdonell commented 7 years ago

Well spotted, thanks! (:

axman6 commented 7 years ago

Thanks Trevor! So doing a little more examination, I noticed that the reap-blur example actually got a little slower with this - not sure if this is because of some weird interaction with the cache or exhausting the registers, or... any number of other potential issues. I'm wondering if there's some sort of heuristic that can be used based on the size and/or shape of the stencil.

tmcdonell commented 7 years ago

That is a little strange. What were the image size and timings?

axman6 commented 7 years ago

So making sure I'm compiling with appropriate flags, the fixed version is faster than the previous one, before:

elapsedTimeMS   = 4909
cpuTimeMS       = 25548

after:

elapsedTimeMS   = 3524
cpuTimeMS       = 17846

when run with repa-blur 10 repa-examples/data/IMG_6999.bmp ./IMG_6999-blur.bmp +RTS -N which was compiled with -Odph -fllvm -optlo-O3.

tmcdonell commented 7 years ago

Thanks for the follow-up. Good to know it is indeed all working as expected (: