xbmc-imx6 / xbmc

XBMC Main Repository
http://xbmc.org
Other
32 stars 5 forks source link

JPEG HW decoder support #77

Open rude78 opened 10 years ago

rude78 commented 10 years ago

As far as I can remember the VPU has HW JPEG decoder support. (MJPG decoder on 4:4:4 supports 120M pixel per second @ 266MHz)

Current measurements for 15MPixel Jpeg (7,6MB) (cubox-i-quad) without HW decoder support: 100MBit NFS: 3,2s Ramdisk @cubox-i: 2.2s (mount -t tmpfs -o size=200M none /mnt/tst)

I assume that IO is not the bottleneck. Therefore HW jpeg decoding could give significant acceleration for browsing large foto galleries.

What do you think?

wolfgar commented 10 years ago

I think you are right : Adding support for hw JPEG decoding would be a nice addition...

rabeeh commented 10 years ago

Rude78 - which distro did you run the benchmark on? i wonder if the codec you have tried is built to use neon instruction set or not.

rude78 commented 10 years ago

I tried using "geexbox-devel-20140622-re5b29c6.cuboxi" and "OpenELEC UNOFFICIAL builds - 4.0.5 - XBMC Gotham 13.1". Not much difference between this two distros in terms of jpeg decoding performance. Does anyone know if the VPU (M)JPEG decoding is limited in image resolution i.e. limited to 1080p? That would almost eliminate the practical use of HW accelerated JPEG decoding.

warped-rudi commented 10 years ago

At least Geexbox uses libjpeg-turbo, which makes some use of neon. Also it uses a small improvement that avoids decoding to a temporary RGB3 buffer when the requested output format is BGRA.

BTW, I have implemented hardware JPEG decoding on the old CuBox. However, the results were not so great. The first problem was, that the decoder used there could only output Y422. Therefore the color space conversions to BGRA and RGB3 still had to be done in software. The second problem was, that XBMC makes use of a feature of libjpeg, that allows to downscale a picture by multiples of 1/8 during the decoding process. The hardware decoder in the Marvell system could only do this by 1/2 and 1/4 (1/8 is possible in one direction, which I found too unpractical to make use of). As a result a more extensive software scaling operation takes place. Due to this (and maybe due the hardware itself) there were more visible stair effects than when doing all in software. The third thing was that the hardware needs serialization so that only one thread is using it at a time. In the end, I got about 10..20% speedup at the expense inferior visual appearance and some quite hacky code.

Of course the iMX VPU may behave better, but this needs to be figured out...

@rude78 : can you provide a sample picture? I'd like to check how the old CuBox performs in that case.

wolfgar commented 10 years ago

I guess it is the same as MJPEG : 8192x8192 Note that rescaling can also be handled by hw... edit : Sorry I posted before reading your post Rudi : constraints for scaling are similar except 1/8 is possible in both direction...

warped-rudi commented 10 years ago

I forgot to mention another thing to check on the iMX: Does the hardware support all kind of JPEGs? The Marvell refused to decode progressive scan files and those using arithmetic coding...

wolfgar commented 10 years ago

Here are features from rationale manual :

Additionally I guess flexible resizing and colorspace conversion can be handled by IPU...

only "baseline" is stated so I guess progressive scan are not supported neither...

rude78 commented 10 years ago

Hi *,

thanks for your detailed replies. I agree that 10-20% speedup is not worth having a hacky or complicated code. Do you have the decoder spec for the Marvell chip?

@warped-rudi: I think you can just use any DSLR image at hand. Should not make a big difference. I also tried a 20MB 160MPixel Image (http://upload.wikimedia.org/wikipedia/commons/0/01/Hellbrunn_banqueting_hall_360_panoramic_view.jpg) but of course decoding time does not scale linear as the final image is down sampled to screen resolution anyway on the fly, right? It roughly took double the time compared to the 15MPixel image.

Finally I think we have to carefully analyze where the jpeg opening time is spent and compare the expected speedup with the drawbacks. Do you know how to add a more precise timing information to the xbmc log?

Thanks, Rudi

rude78 commented 10 years ago

Hi *,

I did some more tests using the jpeg-turbo benchmark tool (All performance values in Mpixels/sec): (cubox-i4 pro, geexbox, libjepg-turbo-1.3.1)

./tjbench ../IMG_0046_full.ppm 95 -yuvdecode -quiet

Bitmap Format Bitmap Order JPEG Subsamp JPEG Qual Image Width Image Height Comp Perf Comp Ratio Decomp Perf
BGR TD GRAY 95 5184 3456 23.26 15.64 49.07
BGR TD 4:2:0 95 5184 3456 20.56 13.53 42.16
BGR TD 4:2:2 95 5184 3456 16.64 12.03 33.91
BGR TD 4:4:4 95 5184 3456 13.52 10.47 24.34

./tjbench ../Hellbrunn_banqueting_hall_360_panoramic_view_full.ppm 95 -yuvdec ode -quiet

Bitmap Format Bitmap Order JPEG Subsamp JPEG Qual Image Width Image Height Comp Perf Comp Ratio Decomp Perf
BGR TD GRAY 95 19992 7939 24.83 15.00 50.53
BGR TD 4:2:0 95 19992 7939 20.64 13.15 41.35
BGR TD 4:2:2 95 19992 7939 17.33 12.06 33.76
BGR TD 4:4:4 95 19992 7939 14.12 10.48 23.32

Without IO decoding a 18MPixel JPEG takes about 0.7seconds using libjpeg-turbo. This could be in theory cut down to 1/4. Can anyone tell me how to measure the time between keypress for loading the image and final display of the decoded image on screen inside xbmc? Then we should know the ratio between I/O, decoding and scaling / conversion.

BR Rudi

warped-rudi commented 10 years ago

Find a snippet I used for measuring decoding time here:

https://gist.github.com/warped-rudi/3366f6f20c55d25797f5

rude78 commented 10 years ago

Thx for the hint.

I will try that and report the measurements.

I will try to find the place for I/O myself. But I am not sure if i find the place to measure the time between decoding and display.

Hedda commented 10 years ago

FYI, found this demo code https://community.freescale.com/thread/318147

wolfgar commented 10 years ago

Hi Hedda,

Thanks a lot for the link. Ironically, I found this resource 3 days ago (and I should have shared it, sorry about forgetting this) For now, I have been able to build the test sample and to give it a quick test. It works fine and it enables to check empirically that the expected (x4) regarding decoding time is the correct order of magnitude (on a imx6q). I have not yet attempted to integrate it properly in xbmc/kodi, I hope to be able to dedicate a little to this task but of course any contribution is also very welcome...

Stéphan

Message du 05/08/14 16:42 De : "Hedda" A : "xbmc-imx6/xbmc" Copie à : "wolfgar" Objet : Re: [xbmc] JPEG HW decoder support (#77)

FYI, found this demo code https://community.freescale.com/thread/318147


Reply to this email directly or view it on GitHub: https://github.com/xbmc-imx6/xbmc/issues/77#issuecomment-51206869

Hedda commented 10 years ago

i.MX 6 VPU also supports MJPEG (Motion JPEG) http://en.wikipedia.org/wiki/Motion_JPEG

See this cool media backgrounds concept http://forum.xbmc.org/showthread.php?tid=141442

Reference: http://hands.com/~lkcl/eoma/iMX6/VPU_API_RM_L3.0.35_1.1.0.pdf

• MJPEG Baseline Process Encoder and Decoder • Baseline ISO/IEC 10918-1 JPEG compliance • Support 1 or 3 color components • 3 component in a scan (interleaved only) • 8 bit samples for each component • Support 4:2:0, 4:2:2, 2:2:4, 4:4:4 and 4:0:0 color format (max. six 8x8 blocks in one MCU) • Minimum encoding size is 16x16 pixels

piotrasd commented 10 years ago

great if @wolfgar integrate this to XBMC :) maybe some skins, views (like cover flow) will work more smooth :)