Closed GoogleCodeExporter closed 9 years ago
Original comment by fbarch...@google.com
on 11 Sep 2012 at 1:48
The first small change that should help is aligned loads/stores.
This should be done for all Neon, not just this one, so opening a new bug.
The second change is to break the function in fetch, convert, store, and do
fetchs for other YUV formats, and stores for other RGB formats. This would
avoid multistep conversions. RGB 24 bit should be easy vst3.8 {d20-d22}, [r3]!
The complications are the calling code, the SSSE3 which is not trivial, and
rgb565/1555/5555 which are not trivial.
vtrn.8 d16, d17 should be changed to put its value in d21, since its Green and
is always between R and B for all RGB formats. Buts its not clear how to do
that without adding a vmov.
Overall register usage is poor and too many 'd' instructions.. should be 'q'.
Perhaps do 16 pixels.
Original comment by fbarch...@google.com
on 4 Oct 2012 at 1:02
Indications are current code has stalls.
Suggest replication be done differently and avoid multply stalls.
Original comment by fbarch...@google.com
on 12 Jan 2013 at 9:25
Original issue reported on code.google.com by
fbarch...@google.com
on 9 Aug 2012 at 5:51