Closed davidplowman closed 9 months ago
For the 24-bit case can you think of any reason why it wouldn't be just as good to rearrange the "rgb" conversion matrix and call the original code? (Obviously we'd still need separate ARGB conversion code given the different pixel size)
No, I think that would be fine too!
I've imported your code, written aarch64 asm (not armv7), fixed the xrgb conversions (bad rgb byte offsets) and pushed it to branch dev/5.1.2/rgbyuv_1. If you'd like to give that a go and check it does what you need I'd be grateful. The aarch64 code wants YUV luma strides that are a multiple of 16 (chroma 8) - it will fall back to C if that isn't met - I can improve that if wanted. I can probably improve the aarch64 code a bit and will do so if it might be helpful.
@davidplowman I've updated my dev/5.1.2/rgbyuv_1 branch. The asm now takes any width/height. I've only written rgb24/bgr24->yuv420p as asm. If you want any others (say rgbx->yuv or rgb->nv12) they are a simple edit from what I've now got. The asm has been pipelined a bit so typical timing is now ~2.6ms for an HD frame (vs ~6.0ms for C). Give it a go!
Thank you very much! Will definitely try it next week...
@davidplowman Have you had a chance to give this a test or should I just merge it into my main branches and hope it doesn't break anything for you? (I think it tests OK but I've been known to be wrong)
Hi, yes I'm sorry I haven't got round to this, it's partly because last time I built my own ffmpeg it borked my OpenCV installation irrecoverably. I think just merge it, it will probably be fine, and I shall still find some time to summon up the courage to give it a try!
Only the BGR24 to YUV420 conversion exists as a more optimised path in the code, this just copies that existing version and swaps red and blue so as to give the same performance in the RGB24 case.
The code has been further copied to implement the RGBA and BGRA cases in exactly the same way.