myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

various conversion routines should expose a way to query for the best performing destination alignment/stride #313

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
E.g. ScalePlaneDown2, with SSE2 or NEON, has a fast path when the dst width % 
16 == 0. If callers could query for this, they could know to use a destination 
stride divisible by 16 for significantly better performance.

Original issue reported on code.google.com by noah...@google.com on 14 Feb 2014 at 7:52

GoogleCodeExporter commented 9 years ago
Its hard to imagine a good API for this.
Best practice is align widths (stride) to 16.
It tends to happen naturally, the exception being low resolution portrait mode.
e.g. 640 x 360 is fine, but 360 is a multiple of 8, not 16.
Much of the libyuv Neon code is multiple of 8, but some is 16.

The thoughts I've had on this subject are:
LOG a message that indicates a performance penalty.
document it
if there is a valid reason for odd sizes, optimize for it.

Original comment by fbarch...@chromium.org on 24 Feb 2014 at 7:24

GoogleCodeExporter commented 9 years ago
A suggestion was made to provide a compile switch that would make slow path 
code produce an assert.

Original comment by fbarch...@chromium.org on 12 May 2014 at 11:47

GoogleCodeExporter commented 9 years ago
For conversions/effects, all functions support unaligned and 'any' versions.
Todo - scale 'any' functions.  But they support unaligned.

Original comment by fbarch...@google.com on 1 Nov 2014 at 1:09

GoogleCodeExporter commented 9 years ago
unaligned and any functions are working in more cases now.
still close this bug, as a query wont be needed.
functions that dont use SIMD for odd width should be optimized individually.

Original comment by fbarch...@google.com on 5 Nov 2014 at 11:34