jtanx / omxcv

GPU assisted H.264 and JPEG encoder for OpenCV on the Raspberry Pi using OpenMAX
Apache License 2.0
19 stars 13 forks source link

Speed #5

Closed jtanx closed 9 years ago

jtanx commented 9 years ago

http://stackoverflow.com/questions/11890997/using-arm-neon-intrinsics-to-add-alpha-and-permute

jtanx commented 9 years ago

See also: http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores

jtanx commented 9 years ago

Hmm instead of

vld3.8 {d0-d2}, [r0]!
vswp d0, d2
vst3.8 {d0-d2}, [r1]!

What about

vld3.8 {d0-d2}, [r0]!
vst3.8 {d2,d1,d0}, [r1]!

On RPI2, BGR2RGB routine allows for 800x600 processing in realtime.

jtanx commented 9 years ago

http://pulsar.webshaker.net/ccc/result.php?lng=us

    loop:
        pld [r0, #192] @Preload 3 cache lines ahead
        vld3.8 {d0-d2}, [r0]! @Load 8 pixels
        vld3.8 {d3-d5}, [r0]! @Load another 8 pixels
        vst3.8 {d2,d1,d0}, [r1]! @Store 8 pixels
        vst3.8 {d5,d4,d3}, [r1]! @Store another 8 pixels
        subs r2, r2, #1 @Decrement counter
        bgt loop @Loop check
        bx lr
jtanx commented 9 years ago

Nope, can't do that. Must use vswp.