w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
268 stars 57 forks source link

Live low latency h.264 streams containing b-frames play jittery in Chrome #177

Closed mazzomazzo closed 7 years ago

mazzomazzo commented 7 years ago

Hello,

If you encode with H.264 Main or High profile, the stream will contain B-frames. So you can see this issue when encoding with Adobe FMLE or OBS (x264).

Unreal Media Server sends very short (30-500 ms length) BMFF video/mp4 segments via WebSockets and MSE is used for playback. That allows to achieve very low latency (0.2 - 1 sec) and works very well as long as there are no b-frames in the stream. When b-frames are present, IE and Edge play OK, but Chrome does not. Depending on the amount of changes on the scene, b-frames will be generated and playback in Chrome becomes broken - old frames play again after new frames - you have crazy movements in your playback.

Clearly, Chrome's decoder is not rearranging the frames according to PTS - frames remain sorted by DTS in the same order they arrive to decoder. Again, this works fine in IE and Edge. Moreover, we tried Chrome version 43.0.2357.65 and it works fine there! So it got broken somewhere since then and it's a very critical issue.

We can provide sample streams and recorded fragmented .mp4 chunks for your analysis. Please advise.

This demo stream: http://umediaserver.net/umediaserver/demohtml5player.html plays fine because it's an IP camera that generates a stream without b-frames. But if you try same webpage with stream encoded by OBS (x264), you can see the issue in Chrome.

Thank you. UMedia team

paulbrucecotton commented 7 years ago

Have you considered filing a Chrome bug since this report seems to be Chrome specific?

mazzomazzo commented 7 years ago

Yes,

We have filed a Chrome bug report: https://bugs.chromium.org/p/chromium/issues/detail?id=703019 but nobody seems to be interested there. Wonder why Wolenetz of chromium doesn't care.

wolenetz commented 7 years ago

I do care :) Thanks for reporting this issue - it doesn't seem to be a spec issue so I'll close this one and we can continue to investigate via https://crbug.com/703019.

jyavenard commented 7 years ago

On Mon, Mar 20, 2017 at 3:43 AM, mazzomazzo notifications@github.com wrote:

Hello,

If you encode with H.264 Main or High profile, the stream will contain B-frames. So you can see this issue when encoding with Adobe FMLE or OBS (x264).

Unreal Media Server sends very short (30-500 ms length) BMFF video/mp4 segments via WebSockets. That allows to achieve very low latency (0.2 - 1 sec) and works very well as long as there are no b-frames in the stream. When b-frames are present, IE and Edge play OK, but Chrome does not. Depending on the amount of changes on the scene, b-frames will be generated and playback in Chrome becomes broken - old frames play again after new frames - you have crazy movements in your playback.

Clearly, Chrome's decoder is not rearranging the frames according to PTS - frames remain sorted by DTS in the same order they arrive to decoder. Again, this works fine in IE and Edge. Moreover, we tried Chrome version 43.0.2357.65 and it works fine there! So it got broken somewhere since then and it's a very critical issue.

We can provide sample streams and recorded fragmented .mp4 chunks for your analysis. Please advise.

This demo stream: http://umediaserver.net/umediaserver/demohtml5player.html plays fine because it's an IP camera that generates a stream without b-frames. But if you try same webpage with stream encoded by OBS (x264), you can see the issue in Chrome.

Thanks you. UMedia team

FWIW,

I've opened https://bugzilla.mozilla.org/show_bug.cgi?id=1350056 to investigate on why Firefox playback doesn't always start.

However, at a glance, I can tell that it seeks in an unbuffered area at the start.

Like here I see that the buffered range is [2.750000, 86.751000], but the player initially seeked to currentTime = 1. So of course playback won't start...

Jean-Yves

jyavenard commented 7 years ago

On Mon, Mar 20, 2017 at 3:43 AM, mazzomazzo notifications@github.com wrote:

This demo stream: http://umediaserver.net/umediaserver/demohtml5player.html plays fine because it's an IP camera that generates a stream without b-frames. But if you try same webpage with stream encoded by OBS (x264), you can see the issue in Chrome.

do you have a stream with b-frames we can look at?

curious as to how firefox would handle this one. AFAIK, if on Windows and with a machine using a hardware decoder, Firefox uses the same framework to decode as Chrome (Windows H264 MFT)

JY

mazzomazzo commented 7 years ago

Firefox, in general, needs at least 8 seconds of media to buffer before starting. It's the worst among all browsers - IE and Edge need 2 seconds, Chrome only needs 200ms. The demo link above will play smoothly with Firefox only if you seek back to at least 8 seconds from realtime.

I wonder why the buffered range starts with 2.75... it should start with 0, like with other browsers.

mazzomazzo commented 7 years ago

JY,

Thanks for trying to fix it in Firefox.

We will arrange a stream with b-frames tonight, so you can look at it tomorrow.

If you want to install Unreal Media Server and OBS and configure the streaming yourself, we have listed the steps at https://bugs.chromium.org/p/chromium/issues/detail?id=703019

jyavenard commented 7 years ago

On Thu, Mar 23, 2017 at 9:16 PM, mazzomazzo notifications@github.com wrote:

Firefox, in general, needs at least 8 seconds of media to buffer before starting. It's the worst among all browsers - IE and Edge need 2 seconds, Chrome only needs 200ms. The demo link above will play smoothly with Firefox only if you seek back to at least 8 seconds from realtime.

I wonder why the buffered range starts with 2.75... it should start with 0, like with other browsers.

because Firefox and Safari are the only browser not buggy when it comes to reporting the buffered range. Both Edge and Chrome incorrectly reports the buffered range using the dts instead of the pts.

Once Chrome and Edge fix this bug (both have mentioned that they are working on it), you will see the same buffered range being reported.

Firefox should only need 3 video frames for playback to start. But of course, that requires the pts to be proper and currentTime to be set in the buffered range.

mazzomazzo commented 7 years ago

JY,

This stream does not contain b-frames, so PTS=DTS. So there is absolutely no explanation of why the beginning of buffered range is 2.75. Please check on that.

With all different streams we tried, Firefox will play smoothly only if it has buffered 8 seconds of media. Even with this stream, if you wait 8 seconds, it will start playing.

mazzomazzo commented 7 years ago

Here is a live stream with this issue:

http://umediaserver.net/umediaserver/bframesissue.html

It is a live stream: OBS captures a player window, encodes with x264 High profile, 1-sec K-frame frequency, and does RTMP push to Unreal Media Server.

jyavenard commented 7 years ago

Hi

Having b-frames got nothing to do with pts==dts. While that may be true, how it’s stored in the MP samples table is all that matter.

But in this instance, you’re right however.. this content does have pts == dts, my bad. I jumped to conclusions too quickly !

However,

the first packet added, contains two key frames: [MediaPlayback #3]: D/MediaSourceSamples TrackBuffersManager(0x10ec7e000:video/mp4; codecs="avc1.42E01E")::ProcessFrames: Processing video/avc frame(pts:-50000 end:-50000, dts:-50000, duration:0, kf:1) [MediaPlayback #3]: D/MediaSourceSamples TrackBuffersManager(0x10ec7e000:video/mp4; codecs="avc1.42E01E")::ProcessFrames: Processing video/avc frame(pts:-50000 end:17000, dts:-50000, duration:67000, kf:1)

So the first keyframe has a pts==dts==-0.05s

As per spec: https://w3c.github.io/media-source/index.html#sourcebuffer-coded-frame-processing https://w3c.github.io/media-source/index.html#sourcebuffer-coded-frame-processing step 8: "If presentation timestamp is less than appendWindowStart https://w3c.github.io/media-source/index.html#dom-sourcebuffer-appendwindowstart, then set the need random access point flag https://w3c.github.io/media-source/index.html#need-RAP-flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame. “

appendWindowStart is set by default to 0, so this frame is to be dropped, and the next frame added must be a keyframe, otherwise it will be dropped. step 10 of coded frame processing algorithm: "If the need random access point flag https://w3c.github.io/media-source/index.html#need-RAP-flag on track buffer equals true, then run the following steps: If the coded frame is not a random access point https://w3c.github.io/media-source/index.html#random-access-point, then drop the coded frame and jump to the top of the loop to start processing the next coded frame. Set the need random access point flag https://w3c.github.io/media-source/index.html#need-RAP-flag on track buffer to false. "

The next frame added that is a keyframe has the following timing: pts:2950000 end:3017000, dts:2950000, duration:67000, kf:1

All 59 frames added prior that one are to be dropped as per spec, as none of them are keyframes.

So here, the buffered range starts as 2.95s which is what firefox reports.

You need to make sure that your first frame has an mp4 that is greater than 0.

gecko does have a workaround that allows negative frames to be added (as we’ve seen plenty of poorly muxed file in our time), but the fuzz for the first sample is +/- its duration, and unfortunately here, the first frame added has a duration of 0 :( As far as gecko is concerned, you could allow your content to play by dropping that first frame with a 0 duration. It won’t be displayed anyway, so it serves no purpose.

JY

On 24 Mar 2017, at 3:47 pm, mazzomazzo notifications@github.com wrote:

JY,

This stream does not contain b-frames, so PTS=DTS. So there is absolutely no explanation of why the beginning of buffered range is 2.75. Please check on that.

With all different streams we tried, Firefox will play smoothly only if it has buffered 8 seconds of media. Even with this stream, if you wait 8 seconds, it will start playing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/media-source/issues/177#issuecomment-289043170, or mute the thread https://github.com/notifications/unsubscribe-auth/AAe3lHWA2XT1JSm9Zgn3tyxO-hPwMndVks5ro9d1gaJpZM4Mh-x-.

mazzomazzo commented 7 years ago

Thanks a lot, we will take a look.

jyavenard commented 7 years ago

BTW:

"With all different streams we tried, Firefox will play smoothly only if it has buffered 8 seconds of media."

This is a limitation of the Windows H264 decoder. it has a latency of around 30 frames. You have very long frames here, so 30 frames can be quite a stretch of time. There's not much we can do about this as the limitation is with the H264 MFT, and for legal reasons, we can't ship our own H264 decoder.

From Windows 8.1, there's a low latency option available in the MFT, it can be enabled by setting the preference media.wmf.low-latency.enabled to true.

Chrome has set it to true by default, but we've found that it caused kernel panic / crashes on some user's systems, so we disabled it. Note that this does NOT work with content that has B-frame. See https://msdn.microsoft.com/en-us/library/windows/desktop/hh447590(v=vs.85).aspx note about B-frames.

That may be the problem Chrome is having and that you are seeing. If they use the Windows hardware decoder, they set the option CODECAPI_AVLowLatencyMode, that will not play nice for you.

What happen if you disable hardware acceleration in chrome? ( see http://ccm.net/faq/35743-google-chrome-how-to-disable-hardware-acceleration for instructions)

mazzomazzo commented 7 years ago

Nope, disabling hardware acceleration in Chrome doesn't help. It's a different problem, take a look at https://bugs.chromium.org/p/chromium/issues/detail?id=703019

We never saw a problem with MFT hardware decoder used by Chrome.

jyavenard commented 7 years ago

Hi.

The comments and analysis from Chrome’s folks in that bug appears valid to me. I hadn’t looked at your stream with b-frames previously.

you wrote: "It's a decoder's task to recalculate PTS for b-frames; the decoder is the one who knows where b-frame belong.”

certainly not, for a start it would be impossible to do with MSE. That’s what the container is for. it contains information about the frames it contains. That information must be valid.

On 24 Mar 2017, at 10:19 pm, mazzomazzo notifications@github.com wrote:

Nope, disabling hardware acceleration in Chrome doesn't help. It's a different problem, take a look at https://bugs.chromium.org/p/chromium/issues/detail?id=703019 https://bugs.chromium.org/p/chromium/issues/detail?id=703019 We never saw a problem with MFT hardware decoder used by Chrome.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/media-source/issues/177#issuecomment-289145583, or mute the thread https://github.com/notifications/unsubscribe-auth/AAe3lCols_Wtogy8ZvIVlMkWs-OOqpI9ks5rpDNrgaJpZM4Mh-x-.

mazzomazzo commented 7 years ago

" it would be impossible to do with MSE"

Edge and IE do it just fine.

mazzomazzo commented 7 years ago

Dear JY,

"So the first keyframe has a pts==dts==-0.05s"

Can you explain how Firefox calculated -0.05 from mp4 atoms received?

"So here, the buffered range starts as 2.95s which is what firefox reports."

For the same stream other browsers report the following start of buffered range: Chrome: 0.066 Egde: 0

FYI.

jyavenard commented 7 years ago

Hi

On Sun, Mar 26, 2017 at 1:57 AM, mazzomazzo notifications@github.com wrote:

Dear JY,

"So the first keyframe has a pts==dts==-0.05s"

Can you explain how Firefox calculated -0.05 from mp4 atoms received?

"So here, the buffered range starts as 2.95s which is what firefox reports."

For the same stream other browsers report the following start of buffered range: Chrome: 0.066 Egde: 0

The same way FFmpeg does and as defined in ISO 14496-12 (available for free at http://standards.iso.org/ittf/PubliclyAvailableStandards/c068960_ISO_IEC_14496-12_2015.zip )

You can check them with: ffprobe.exe -show_entries packet=pts_time,dts_time,duration_time,stream_index

this is the output of your first init segment with the first media segment:

../ffmpeg/ffprobe.exe -show_entries packet=pts_time,dts_time,duration_time,st ream_index t.bin ffprobe version N-84679-gd65b595 Copyright (c) 2007-2017 the FFmpeg developers built with gcc 6.3.0 (GCC) configuration: --enable-gpl --enable-version3 --enable-cuda --enable-cuvid --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-zlib libavutil 55. 51.100 / 55. 51.100 libavcodec 57. 86.103 / 57. 86.103 libavformat 57. 67.100 / 57. 67.100 libavdevice 57. 3.101 / 57. 3.101 libavfilter 6. 78.100 / 6. 78.100 libswscale 4. 3.101 / 4. 3.101 libswresample 2. 4.100 / 2. 4.100 libpostproc 54. 2.100 / 54. 2.100 [mov,mp4,m4a,3gp,3g2,mj2 @ 00000000025934c0] decoding for stream 0 failed Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 't.bin': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2iso5avc1mp41 Duration: 00:00:00.07, start: -0.050000, bitrate: 1976 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuvj420p(pc, bt709), 800x450 [SAR 1:1 DAR 16:9], 1876 kb/s, 14.93 fps, 10k tbr, 10k tbn, 20k tbc (default) Metadata: handler_name : Bento4 Video Handler [PACKET] stream_index=0 pts_time=-0.050000 dts_time=-0.050000 duration_time=N/A [/PACKET]

So there it shows clearly the time as -0.05 too.

How it's calculated: looking at the content structure (init+1st media) ../bento4/mp4dump.exe t.bin [ftyp] size=8+28 major_brand = isom minor_version = 200 compatible_brand = isom compatible_brand = iso2 compatible_brand = iso5 compatible_brand = avc1 compatible_brand = mp41 [moov] size=8+678 [mvhd] size=12+96 timescale = 1000 duration = 0 duration(ms) = 0 [trak] size=8+506 [tkhd] size=12+80, flags=7 enabled = 1 id = 1 duration = 0 width = 800.000000 height = 450.000000 [edts] size=8+28 [elst] size=12+16 entry count = 1 entry/segment duration = 0 entry/media time = 500 entry/media rate = 1 [mdia] size=8+370 [mdhd] size=12+20 timescale = 10000 duration = 0 duration(ms) = 0 language = und [hdlr] size=12+41 handler_type = vide handler_name = Bento4 Video Handler [minf] size=8+277 [vmhd] size=12+8, flags=1 graphics_mode = 0 op_color = 0000,0000,0000 [dinf] size=8+28 [dref] size=12+16 [url ] size=12+0, flags=1 location = [local to file] [stbl] size=8+213 [stsd] size=12+133 entry-count = 1 [avc1] size=8+121 data_reference_index = 1 width = 800 height = 450 compressor = [avcC] size=8+35 Configuration Version = 1 Profile = Main Profile Compatibility = 4d Level = 41 NALU Length Size = 4 Sequence Parameter = [67 4d 00 29 e2 90 19 07 7f 11 80 b7 01 01 01 a4 1e 24 45 40] Picture Parameter = [68 ee 3c 80] [stsz] size=12+8 sample_size = 0 sample_count = 0 [stsc] size=12+4 entry_count = 0 [stts] size=12+4 entry_count = 0 [stco] size=12+4 entry_count = 0 [mvex] size=8+48 [mehd] size=12+4 duration = 0 [trex] size=12+20 track id = 1 default sample description index = 1 default sample duration = 0 default sample size = 0 default sample flags = 10000 [moof] size=8+96 [mfhd] size=12+4 sequence number = 1 [traf] size=8+72 [tfhd] size=12+8, flags=20020 track ID = 1 default sample flags = 1010000 [tfdt] size=12+8, version=1 base media decode time = 0 [trun] size=12+20, flags=305 sample count = 1 data offset = 112 first sample flags = 2000000 [mdat] size=8+15715

So base media decode time is 0. The mdhd (media header) define a timescale of 10000.

This content has a non-empty edit list, with one entry, which is used to define the offset for the media timeline. The elst (edit list) time is 500 for this track.

As per ISO 14496-12:2015 8.6.6.1 Definition

The first sample time as such is: (0-500) / 10000 = -500 / 10000 = -0.05s.

I note that your test stream now doesn't add the initial frame with a duration of 0, which allows firefox work around broken pts to kick in, and playback starts immediately, I have less than 1s latency here.

As to why Chrome and Edge returns the wrong PTS, you'll got to ask them, their calculations of buffered range is broken to start with (using dts in place of dts). IIRC stagefright (used on android) incorrectly calculated (ignored) the edit list offset.

JY

mazzomazzo commented 7 years ago

Hi JY,

Thanks for detailed explanation, we will take a look.

Again, the test stream (Venice beach) does not have b-frames, so DTS=PTS, so your statement about Chrome and Edge incorrectly calculating the buffered range (even if your statement is correct, which is a question) can not be based on PTS-DTS difference. Rather maybe they also ignore this Edit List offset.

I note that your test stream now doesn't add the initial frame with a duration of 0

Our test stream did not change, it was the same last week. Why are you seeing it now and haven't seen last week, is not clear to me.

jyavenard commented 7 years ago

Hi

On Mon, Mar 27, 2017 at 5:08 PM, mazzomazzo notifications@github.com wrote:

Again, the test stream (Venice beach) does not have b-frames, so DTS=PTS, so your statement about Chrome and Edge incorrectly calculating the buffered range (even if your statement is correct, which is a question) can not be based on PTS-DTS difference. Rather maybe they also ignore this Edit List offset.

I didn't state that the reason on why you're seeing a different buffered range between Firefox and Chrome/Edge is due to them incorrectly using dts instead of pts. In this particular case, it's likely not.

What I wrote is that their calculation was broken to start with. Chromium has a tracking bug for it: https://bugs.chromium.org/p/chromium/issues/list?q=label:MSEptsdtsCleanup

Edge isn't open source, so I don't know if they do or not (but it was confirmed that they also had this issue).

What I'm 100% sure about, is that Firefox buffered range calculation is correct, but you may believe that I'm biased on the matter seeing that I wrote that code. So that may cloud my judgement :)

In any cases, there's many things incorrect with your stream, it's not ISOBMFF compliant, you can't expect that all web browsers will play it perfectly. Many people have gone beyond their way to help you identify the problems... Dismissing their response isn't going to entice people to continue helping out. Just saying...

mazzomazzo commented 7 years ago

JY,

Nobody dismisses your responses. It's the opposite, we thank you very much for that.

there's many things incorrect with your stream

what are these things, besides that -0.05 causing non-zero buffered range start in FireFox?

What I'm 100% sure about, is that Firefox buffered range calculation is correct

Please understand that this is not as clear to us, given that Chrome and Edge provide different results.

jyavenard commented 7 years ago

On Mon, Mar 27, 2017 at 7:53 PM, mazzomazzo notifications@github.com wrote:

JY,

Nobody dismisses your responses. It's the opposite, we thank you very much for that.

I'm referring to the comments of the Chromium's dev: that your stream with B-frames have invalid pts, and your answer about it.

The decoding of such stream is decoder dependent. It may work with Edge and Firefox on Windows that uses Windows Media Foundation and reorder the decoded frames in pts order. It certainly won't work on mac (the Apple Video Toolbox h264 decoder) returns frames in decode order. On Linux or system using FFmpeg the behaviour will be different once again.

Seeking will be almost broken on those (at the very least will seek to the wrong time)

mazzomazzo commented 7 years ago

Hi JY,

Thanks again for your help; we have updated our server software and fixed the -0.05 issue that caused slow startup; also our server software now supplies composition times in .mp4 segments (when we have these composition times available). That fixes the b-frames jitter issue.