Open rude78 opened 10 years ago
I think you are right : Adding support for hw JPEG decoding would be a nice addition...
Rude78 - which distro did you run the benchmark on? i wonder if the codec you have tried is built to use neon instruction set or not.
I tried using "geexbox-devel-20140622-re5b29c6.cuboxi" and "OpenELEC UNOFFICIAL builds - 4.0.5 - XBMC Gotham 13.1". Not much difference between this two distros in terms of jpeg decoding performance. Does anyone know if the VPU (M)JPEG decoding is limited in image resolution i.e. limited to 1080p? That would almost eliminate the practical use of HW accelerated JPEG decoding.
At least Geexbox uses libjpeg-turbo, which makes some use of neon. Also it uses a small improvement that avoids decoding to a temporary RGB3 buffer when the requested output format is BGRA.
BTW, I have implemented hardware JPEG decoding on the old CuBox. However, the results were not so great. The first problem was, that the decoder used there could only output Y422. Therefore the color space conversions to BGRA and RGB3 still had to be done in software. The second problem was, that XBMC makes use of a feature of libjpeg, that allows to downscale a picture by multiples of 1/8 during the decoding process. The hardware decoder in the Marvell system could only do this by 1/2 and 1/4 (1/8 is possible in one direction, which I found too unpractical to make use of). As a result a more extensive software scaling operation takes place. Due to this (and maybe due the hardware itself) there were more visible stair effects than when doing all in software. The third thing was that the hardware needs serialization so that only one thread is using it at a time. In the end, I got about 10..20% speedup at the expense inferior visual appearance and some quite hacky code.
Of course the iMX VPU may behave better, but this needs to be figured out...
@rude78 : can you provide a sample picture? I'd like to check how the old CuBox performs in that case.
I guess it is the same as MJPEG : 8192x8192 Note that rescaling can also be handled by hw... edit : Sorry I posted before reading your post Rudi : constraints for scaling are similar except 1/8 is possible in both direction...
I forgot to mention another thing to check on the iMX: Does the hardware support all kind of JPEGs? The Marvell refused to decode progressive scan files and those using arithmetic coding...
Here are features from rationale manual :
Additionally I guess flexible resizing and colorspace conversion can be handled by IPU...
only "baseline" is stated so I guess progressive scan are not supported neither...
Hi *,
thanks for your detailed replies. I agree that 10-20% speedup is not worth having a hacky or complicated code. Do you have the decoder spec for the Marvell chip?
@warped-rudi: I think you can just use any DSLR image at hand. Should not make a big difference. I also tried a 20MB 160MPixel Image (http://upload.wikimedia.org/wikipedia/commons/0/01/Hellbrunn_banqueting_hall_360_panoramic_view.jpg) but of course decoding time does not scale linear as the final image is down sampled to screen resolution anyway on the fly, right? It roughly took double the time compared to the 15MPixel image.
Finally I think we have to carefully analyze where the jpeg opening time is spent and compare the expected speedup with the drawbacks. Do you know how to add a more precise timing information to the xbmc log?
Thanks, Rudi
Hi *,
I did some more tests using the jpeg-turbo benchmark tool (All performance values in Mpixels/sec): (cubox-i4 pro, geexbox, libjepg-turbo-1.3.1)
./tjbench ../IMG_0046_full.ppm 95 -yuvdecode -quiet
Bitmap Format | Bitmap Order | JPEG Subsamp | JPEG Qual | Image Width | Image Height | Comp Perf | Comp Ratio | Decomp Perf |
---|---|---|---|---|---|---|---|---|
BGR | TD | GRAY | 95 | 5184 | 3456 | 23.26 | 15.64 | 49.07 |
BGR | TD | 4:2:0 | 95 | 5184 | 3456 | 20.56 | 13.53 | 42.16 |
BGR | TD | 4:2:2 | 95 | 5184 | 3456 | 16.64 | 12.03 | 33.91 |
BGR | TD | 4:4:4 | 95 | 5184 | 3456 | 13.52 | 10.47 | 24.34 |
./tjbench ../Hellbrunn_banqueting_hall_360_panoramic_view_full.ppm 95 -yuvdec ode -quiet
Bitmap Format | Bitmap Order | JPEG Subsamp | JPEG Qual | Image Width | Image Height | Comp Perf | Comp Ratio | Decomp Perf |
---|---|---|---|---|---|---|---|---|
BGR | TD | GRAY | 95 | 19992 | 7939 | 24.83 | 15.00 | 50.53 |
BGR | TD | 4:2:0 | 95 | 19992 | 7939 | 20.64 | 13.15 | 41.35 |
BGR | TD | 4:2:2 | 95 | 19992 | 7939 | 17.33 | 12.06 | 33.76 |
BGR | TD | 4:4:4 | 95 | 19992 | 7939 | 14.12 | 10.48 | 23.32 |
Without IO decoding a 18MPixel JPEG takes about 0.7seconds using libjpeg-turbo. This could be in theory cut down to 1/4. Can anyone tell me how to measure the time between keypress for loading the image and final display of the decoded image on screen inside xbmc? Then we should know the ratio between I/O, decoding and scaling / conversion.
BR Rudi
Find a snippet I used for measuring decoding time here:
Thx for the hint.
I will try that and report the measurements.
I will try to find the place for I/O myself. But I am not sure if i find the place to measure the time between decoding and display.
FYI, found this demo code https://community.freescale.com/thread/318147
Hi Hedda,
Thanks a lot for the link. Ironically, I found this resource 3 days ago (and I should have shared it, sorry about forgetting this) For now, I have been able to build the test sample and to give it a quick test. It works fine and it enables to check empirically that the expected (x4) regarding decoding time is the correct order of magnitude (on a imx6q). I have not yet attempted to integrate it properly in xbmc/kodi, I hope to be able to dedicate a little to this task but of course any contribution is also very welcome...
Stéphan
Message du 05/08/14 16:42 De : "Hedda" A : "xbmc-imx6/xbmc" Copie à : "wolfgar" Objet : Re: [xbmc] JPEG HW decoder support (#77)
FYI, found this demo code https://community.freescale.com/thread/318147
Reply to this email directly or view it on GitHub: https://github.com/xbmc-imx6/xbmc/issues/77#issuecomment-51206869
i.MX 6 VPU also supports MJPEG (Motion JPEG) http://en.wikipedia.org/wiki/Motion_JPEG
See this cool media backgrounds concept http://forum.xbmc.org/showthread.php?tid=141442
Reference: http://hands.com/~lkcl/eoma/iMX6/VPU_API_RM_L3.0.35_1.1.0.pdf
• MJPEG Baseline Process Encoder and Decoder • Baseline ISO/IEC 10918-1 JPEG compliance • Support 1 or 3 color components • 3 component in a scan (interleaved only) • 8 bit samples for each component • Support 4:2:0, 4:2:2, 2:2:4, 4:4:4 and 4:0:0 color format (max. six 8x8 blocks in one MCU) • Minimum encoding size is 16x16 pixels
great if @wolfgar integrate this to XBMC :) maybe some skins, views (like cover flow) will work more smooth :)
As far as I can remember the VPU has HW JPEG decoder support. (MJPG decoder on 4:4:4 supports 120M pixel per second @ 266MHz)
Current measurements for 15MPixel Jpeg (7,6MB) (cubox-i-quad) without HW decoder support: 100MBit NFS: 3,2s Ramdisk @cubox-i: 2.2s (mount -t tmpfs -o size=200M none /mnt/tst)
I assume that IO is not the bottleneck. Therefore HW jpeg decoding could give significant acceleration for browsing large foto galleries.
What do you think?