tvlabs / edge264

Simple H.264 decoder
BSD 3-Clause "New" or "Revised" License
45 stars 1 forks source link

https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/1080/Big_Buck_Bunny_1080_10s_30MB.mp4 doesn't decode #5

Open jrmuizel opened 1 year ago

jrmuizel commented 1 year ago

After converting to a h264 file I get a decoding error:

nal_ref_idc3
nal_unit_type5 (Coded slice of an IDR picture)
first_mb_in_slice0
slice_type7 (I)
pic_parameter_set_id0
Decoding error
traffaillac commented 1 year ago

My bad, while I am working on 8x8 transforms at the moment they are enabled although they do not work yet. Big Buck Bunny uses these transforms so it won't decode at the moment (although normally it should return "unsupported"). I will create a release tag once I promote edge264, so that these kinds of temporary unstabilities don't happen in the future. To benchmark edge264 at the moment you should use a video without 8x8 transforms, so anything with Main profile and progressive scan, or to reencode it with no-8x8dct for x264.

jrmuizel commented 1 year ago

This seems to mostly work now.

I did some quick benchmarking and got:

openh264: 9.1s libavcodec: 5s edge264: 4.6s

``` ./h264dec ~/src/edge264/video.264 H264 source file name: /Users/jrmuizel/src/edge264/video.264.. Can not find any output file to write.. ------------------------------------------------------ ------------------------------------------------------- iWidth: 1920 height: 1080 Frames: 300 decode time: 9.190982 sec FPS: 32.640691 fps ------------------------------------------------------- ``` ``` $ time ~/src/ffmpeg/ffmpeg -threads 1 -i video.264 -benchmark -f null - ffmpeg version N-105802-g129f5ed87e Copyright (c) 2000-2022 the FFmpeg developers built with Apple clang version 12.0.0 (clang-1200.0.32.29) configuration: --enable-libopenh264 --enable-debug=3 --extra-cflags=-fno-omit-frame-pointer libavutil 57. 21.100 / 57. 21.100 libavcodec 59. 21.102 / 59. 21.102 libavformat 59. 17.102 / 59. 17.102 libavdevice 59. 5.100 / 59. 5.100 libavfilter 8. 27.100 / 8. 27.100 libswscale 6. 5.100 / 6. 5.100 libswresample 4. 4.100 / 4. 4.100 [h264 @ 0x7f8248507b40] Stream #0: not enough frames to estimate rate; consider increasing probesize Input #0, h264, from 'video.264': Duration: N/A, bitrate: N/A Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native)) Press [q] to stop, [?] for help Output #0, null, to 'pipe:': Metadata: encoder : Lavf59.17.102 Stream #0:0: Video: wrapped_avframe, yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn Metadata: encoder : Lavc59.21.102 wrapped_avframe frame= 300 fps= 61 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.03x video:131kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=4.900s stime=0.035s rtime=4.941s bench: maxrss=113315840kB real 0m5.067s user 0m5.006s sys 0m0.053s ``` ``` ./edge264_play-cc -b video.264 CPU: 4590456 us memory: 132792320 B ```

Here's a profile of edge264 in case you're interested: https://share.firefox.dev/3AXlVA2

traffaillac commented 1 year ago

Damn you're fast!

I have 6 conformance clips still failing so 8x8 support is not yet done but almost there. I tested locally vs ffmpeg too and got the same results, which was a bit disappointing considering the number of theoretical improvements. With another HD clip from the movie Monsters edge264 has the same performance as libavcodec :/ Still the gap with openh264 is very cool :) My tests so far have been on SD clips, so I suspect that libavcodec might have long initialization times which matter less with big frames, and edge264 might suffer from cache associativity conflicts related to frame strides (which I didn't investigate yet).

Thanks a lot for the profiler results! It shows that deblocking is a major performance hog (function finish_frame, 7.9%) which can be improved. For the rest I'll need cache misses. Is Firefox profiler something I can host locally?

Cheers, Thibault

jrmuizel commented 1 year ago

I gathered the profiles using https://github.com/mstange/samply on macOS. It works locally if you're on macOS or Linux.

If you want assembly support in the profile you'll need to use these instructions: https://gist.github.com/mstange/6b2b3b15708cce847eacfabcf4a9f4cc But beware, the assembly support is not done yet so you may run into bugs

jrmuizel commented 1 year ago

And libavc gets ~7s

time ./avcdec dec-single-thread.cfg
real    0m7.089s
user    0m6.860s
sys 0m0.221s

https://share.firefox.dev/3J7bKOd

traffaillac commented 1 year ago

Well that is some more good news, thanks :) You might get some more speedup with GCC < 10. I need to free some disk space to try with all versions of GCC, but basically the older the better. edge264 now decodes Big Buck Bunny fine. If you plan on using it in any project please give me some feedback on most pressing features ! Cheers, Thibault

traffaillac commented 6 months ago

And libavc gets ~7s

time ./avcdec dec-single-thread.cfg
real  0m7.089s
user  0m6.860s
sys   0m0.221s

https://share.firefox.dev/3J7bKOd

Hi @jrmuizel, I am preparing a presentation on edge264 for FOSDEM and need to benchmark avcdec on my machine (macOS Monterey, Intel Broadwell). I compiled the repo from https://android.googlesource.com/platform/external/libavc but then avcdec keeps looping on Error in header decode 0x0 no matter what options or input files I set. Have you had the same issue, do you know of some documentation on the lib, and maybe can you share your dec-single-thread.cfg file?

Cheers, Thibault

jrmuizel commented 6 months ago

My dec-single-thread.cfg looks like:

--input input.h264
--save_output 0
--num_frames -1
--output out.yuv
--chroma_format YUV_420P
--share_display_buf 0
--num_cores 1
--loopback 0
--display 0
--fps 59.94
jrmuizel commented 6 months ago

input.h264 was made with: ffmpeg -i Big_Buck_Bunny_1080_10s_30MB.mp4 -vcodec copy -bsf h264_mp4toannexb -an input.264

traffaillac commented 6 months ago

Thanks! Actually I didn't notice that only Linux support is mentioned on the repo, not macOS. I'd like to present your bench values in an intro slide if you don't mind, to show that edge264 is fastest overall, before diving into programming techniques.

jrmuizel commented 6 months ago

Yep, you can use my values. Also, avcdec does work on macOS. That's where my numbers are from.