Closed penguinol closed 3 years ago
Please provide info about codec, producer encodings, etc.
We are using chrome with ExternalCodec. The orginal log was missed, but it should be something like this:
{
"id": "aJQ9NMBCTbCdh2k09pLT_Q==",
"kind": "video",
"paused": false,
"rtpMapping": {
"codecs": [{
"mappedPayloadType": 102,
"payloadType": 107
}, {
"mappedPayloadType": 101,
"payloadType": 125
}],
"encodings": [{
"mappedSsrc": 871957488,
"rid": "l",
"ssrc": null
}, {
"mappedSsrc": 871957489,
"rid": "m",
"ssrc": null
}, {
"mappedSsrc": 871957490,
"rid": "h",
"ssrc": null
}]
},
"rtpParameters": {
"codecs": [{
"clockRate": 90000,
"mimeType": "video/H264",
"parameters": {
"level-asymmetry-allowed": 1,
"packetization-mode": 1,
"profile-level-id": "42e01f"
},
"payloadType": 125,
"rtcpFeedback": [{
"type": "goog-remb"
}, {
"type": "transport-cc"
}, {
"parameter": "fir",
"type": "ccm"
}, {
"type": "nack"
}, {
"parameter": "pli",
"type": "nack"
}]
}, {
"clockRate": 90000,
"mimeType": "video/rtx",
"parameters": {
"apt": 125
},
"payloadType": 107,
"rtcpFeedback": []
}],
"encodings": [{
"codecPayloadType": 125,
"ksvc": false,
"rid": "l",
"scalabilityMode": "S1T3",
"spatialLayers": 1,
"temporalLayers": 3
}, {
"codecPayloadType": 125,
"ksvc": false,
"rid": "m",
"scalabilityMode": "S1T3",
"spatialLayers": 1,
"temporalLayers": 3
}, {
"codecPayloadType": 125,
"ksvc": false,
"rid": "h",
"scalabilityMode": "S1T3",
"spatialLayers": 1,
"temporalLayers": 3
}],
"headerExtensions": [{
"encrypt": false,
"id": 4,
"parameters": {},
"uri": "urn:ietf:params:rtp-hdrext:sdes:mid"
}, {
"encrypt": false,
"id": 5,
"parameters": {},
"uri": "urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id"
}, {
"encrypt": false,
"id": 6,
"parameters": {},
"uri": "urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id"
}, {
"encrypt": false,
"id": 2,
"parameters": {},
"uri": "http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time"
}, {
"encrypt": false,
"id": 3,
"parameters": {},
"uri": "http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01"
}, {
"encrypt": false,
"id": 8,
"parameters": {},
"uri": "http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07"
}, {
"encrypt": false,
"id": 13,
"parameters": {},
"uri": "urn:3gpp:video-orientation"
}, {
"encrypt": false,
"id": 14,
"parameters": {},
"uri": "urn:ietf:params:rtp-hdrext:toffset"
}],
"rtcp": {
"cname": "HZ270SjEfbtnWyVO",
"reducedSize": true
}
},
"rtpStreams": [{
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/H264",
"payloadType": 125,
"rid": "m",
"rtxPayloadType": 107,
"rtxSsrc": 1478239160,
"spatialLayers": 1,
"ssrc": 159984670,
"temporalLayers": 3,
"useDtx": false,
"useFir": true,
"useInBandFec": false,
"useNack": true,
"usePli": true
},
"rtxStream": {
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/rtx",
"payloadType": 107,
"rrid": "m",
"ssrc": 1478239160
}
},
"score": 10
}, {
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/H264",
"payloadType": 125,
"rid": "l",
"rtxPayloadType": 107,
"rtxSsrc": 1639204714,
"spatialLayers": 1,
"ssrc": 668485325,
"temporalLayers": 3,
"useDtx": false,
"useFir": true,
"useInBandFec": false,
"useNack": true,
"usePli": true
},
"rtxStream": {
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/rtx",
"payloadType": 107,
"rrid": "l",
"ssrc": 1639204714
}
},
"score": 10
}, {
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/H264",
"payloadType": 125,
"rid": "h",
"rtxPayloadType": 107,
"rtxSsrc": 1846198098,
"spatialLayers": 1,
"ssrc": 2801259313,
"temporalLayers": 3,
"useDtx": false,
"useFir": true,
"useInBandFec": false,
"useNack": true,
"usePli": true
},
"rtxStream": {
"params": {
"clockRate": 90000,
"cname": "HZ270SjEfbtnWyVO",
"mimeType": "video/rtx",
"payloadType": 107,
"rrid": "h",
"ssrc": 1846198098
}
},
"score": 10
}],
"traceEventTypes": "",
"type": "simulcast"
}
So this is H264. Ok, the theory in mediasoup is the following:
Taking this theory into account, perhaps the problem is the SUPER PAINFUL keyframe detection in H264. This does not happen at all in VP8, right?
We have not tested vp8, casue we need to be compatible with some old devices which do not support vp8.
Yep, I understand. What I mean is that, in theory the only issue could be a wrong (or too optimistic) keyframe detection:
https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/Codecs/H264.cpp#L14
The frame we sent when switching layer looks like idr frame in wireshark, so i think the keyframe detection is right in this case.
Is it reproducible @penguinol? Have you checked the chrome logs?
$ chromeBinary --enable-logging --vmodule=*/webrtc/*=9,*=-2
Yes, it's reproducible. Switch layer for about 10~20 times will appear once. I only tested in H.264. I'll check the chrome logs later. I tried to make a skip on rtp sequence number between two frames when previous frame was not send completely. It seems can fix the problem. I guess it's because when seq num is continuous, and the decoder receive the packet of next frame, it will try to decode previous frame, although the frame is incomplete. When the seq num is discontinuous, the decoder might know previous is incomplete, so it will drop the frame. But i'm not sure whether there is any side effect.
Are you using the mediasoup demo? Because I'm unable to reproduce it there? Can you repro there?
No, we are using our own web app, but it also repro in mediasoup-demo. Also you need to switch spatial layers. By the way my chrome version is: 83.0.4103.61 It's easier to reproduce when switching between high spatial layers, beacuse there are more rtp packets in one frame.
Here is my encoding config:
routerOptions :
{
mediaCodecs :
[
{
kind : 'audio',
mimeType : 'audio/opus',
clockRate : 48000,
channels : 2
},
{
kind : 'video',
mimeType : 'video/h264',
clockRate : 90000,
parameters :
{
'packetization-mode' : 1,
'profile-level-id' : '42e032',
'level-asymmetry-allowed' : 1,
'x-google-start-bitrate' : 1000
}
}
]
},
Here is the way i fix it. https://github.com/penguinol/mediasoup/commit/de2ed3f9a8a1a36e7ca5b392390fb44da30f4341
Interesting. Let's please some days to handle this (terribly busy).
I also experience a somewhat similar issue. In 100% reproducible case with simulcast VP8 stream from Chromium sent to GStreamer, but only during initial start of the stream (changing spatial or temporal layer seems to fix it).
With @penguinol's fork I no longer see such issues (and from code changes they are not codec-specific indeed).
Interesting. @penguinol may you please rename lastSentPacketHasMarker
to lastSentRtpPacketHasMarker
and provide a PR for this?
Since a video frame is carried by one or more RTP packets sharing the same RTP timestamp, I would say it would be more accurate making a timestamp jump instead of making a sequence number one.
So instead of modifying SeqManager
to increase the base by one, we could decrease by one tsOffset
in SimulcastConsumer
. I find this change more reliable.
Could you please try it @penguinol?
Actually we already take care of NOT sending timestamp
in the keyframe of the switching stream equal or less than the timestamp
sent in the last packet sent (belonging to the previous simulcast stream). Look for shouldSwitchCurrentSpatialLayer
in SimulcastConsimer.cpp
. This should avoid the problem.
@penguinol @nazar-pc can you enable simulcast
log tag in the worker and see if, when the "mosaics" happen, there is this log?
giving up on proper stream switching after got a requested keyframe for which still too high RTP timestamp extra offset is needed
@ibc @jmillan Ok, I'll try it later.
Here is my log:
This is how decoded image looks like:
Wow @nazar-pc, it seems you can repro really easily (fast).
Is there any chance we can get the same logs with the following diff applied?
diff --git a/worker/src/RTC/SimulcastConsumer.cpp b/worker/src/RTC/SimulcastConsumer.cpp
index 36fdef49..8fdf922c 100644
--- a/worker/src/RTC/SimulcastConsumer.cpp
+++ b/worker/src/RTC/SimulcastConsumer.cpp
@@ -865,6 +865,19 @@ namespace RTC
origSeq,
origTimestamp);
}
+ else
+ {
+ MS_DEBUG_TAG(
+ rtp,
+ "sending packet [ssrc:%" PRIu32 ", seq:%" PRIu16 ", ts:%" PRIu32
+ "] from original [ssrc:%" PRIu32 ", seq:%" PRIu16 ", ts:%" PRIu32 "]",
+ packet->GetSsrc(),
+ packet->GetSequenceNumber(),
+ packet->GetTimestamp(),
+ origSsrc,
+ origSeq,
+ origTimestamp);
+ }
// Process the packet.
if (this->rtpStream->ReceivePacket(packet))
Sure, it happens right away for me and seems to fix itself on key frame or layers changing:
Thanks for the logs @nazar-pc,
Here the key:
mediasoup:Channel [pid:1888368] RTC::SimulcastConsumer::SendRtpPacket() | sending packet [ssrc:865570604, seq:18, ts:1600308764] from original [ssrc:289936824, seq:10205, ts:1600308764] +5ms
mediasoup:Channel [pid:1888368] RTC::Producer::ReceiveRtpPacket() | key frame received [ssrc:3637030475, seq:4528] +0ms
mediasoup:Channel [pid:1888368] RTC::SimulcastConsumer::SendRtpPacket() | sync key frame received +0ms
mediasoup:Channel [pid:1888368] RTC::SimulcastConsumer::SendRtpPacket() | RTP timestamp extra offset generated for stream switching: 1 +1ms
mediasoup:Channel [pid:1888368] RTC::SimulcastConsumer::SendRtpPacket() | sending sync packet [ssrc:865570604, seq:19, ts:1600308765] from original [ssrc:289936825, seq:4528, ts:2728670638] +0ms
mediasoup:Channel [pid:1888368] RTC::SimulcastConsumer::SendRtpPacket() | sending packet [ssrc:865570604, seq:20, ts:1600308765] from original [ssrc:289936825, seq:4529, ts:2728670638] +4ms
We are behaving as expected: when changing the source stream we increase the RTP timestamp in order to tell the encoder that this RTP packet contains a different frame than the previous one. Encoder should recognize it and encode it as a new fresh frame.
I wonder why the encoder does not recognize the payload of a RTP packet with different timestamp than the previous as a different frame. That behavior is not RFC compliant.
@penguinol solution will generate a missing RTP packet to the Receiver Reports and stats, because even if we receive a NACK for it , cannot resend a RTP packet that we never sent in the first place.
Without going further to encoders iplementations, probably the currently applied RTP timestamp difference (+1 unit) is not enough for the encoder to consider the payload a new frame. We could increase it the number of units corresponding to a new real frame. This could be made by calculating the offset in SimulcastConsumer
.
BTW, I'm still no able to reproduce this :-(
My opinion is that the decoder can recognize the new frame, cause if it doesn't, it should request a key frame after decoding failure, but there is no fir or pli in the packet, and the video is also clear. So the decoder can recognize the new frame, but it doesn't know the frame is incomplete, because the rtp seqnum is continuous and it try to decode the incomplete frame and the problem occures. Or the decoder does know the frame is incomplete (because there is no marker), but it will try to decode the frame in this situation. Making a skip in seq num will let the decoder know the frame is incomplete, and tell it do not try to decode the frame. It might be a decoder related problem, i also can't reproduce it with vp8. I have no idea about how to fix the NACK statistics info.
@penguinol what you say makes sense.
Still I would love to be able to reproduce this. I've been able to see mosaics once in the mediasoup demo while applying download packet loss.
@nazar-pc any chance you could share how to reproduce it? Since you can reproduce it immediately in your scenario it would be really helpful trying it locally.
@nazar-pc any chance you could share how to reproduce it? Since you can reproduce it immediately in your scenario it would be really helpful trying it locally.
Trying with mediasoup-demo locally for now.
any chance you could share how to reproduce it?
It is a part of larger proprietary app that would be very non-trivial to reduce. Also it uses custom SDP serializer/parser for talking to GStreamer's webrtcbin (will hopefully replace with https://github.com/versatica/mediasoup-sdp-bridge at some point). You'd also need to build GStreamer from master branch. It is possible to do, but will take a long time to prepare a reproducible demo.
Other than that it is a regular procedure of creating WebRTC transport, consumer, exchanging SDP and resuming paused consumer as always.
Would you @nazar-pc, @penguinol please try this branch?
It just adds a RTP timestamp offset equivalent to that which an encoder would add for a 30fps encoding for intraframe RTP packets instead of the current value of 1 that we set.
I would like to know if the decoders would consider the packet payload now a new frame due to the TS difference between this and the previous packet from a different layer.
Other that that, another option could be re-sending the last sent RTP packet but this time setting the marker bit to one, and only after that sending the RTP sync packet from the new layer.
Thanks
Yes, seems to work fine for me with both 30fps and 60fps camera input :+1:
Would you @nazar-pc, @penguinol please try this branch?
It just adds a RTP timestamp offset equivalent to that which an encoder would add for a 30fps encoding for intraframe RTP packets instead of the current value of 1 that we set.
I would like to know if the decoders would consider the packet payload now a new frame due to the TS difference between this and the previous packet from a different layer.
Other that that, another option could be re-sending the last sent RTP packet but this time setting the marker bit to one, and only after that sending the RTP sync packet from the new layer.
Thanks
It seems does not work for me.
Here is a gif shows how to reproduce the problem. https://github.com/penguinol/mediasoup/blob/408/doc/3.gif
@jmillan do you think changes from mentioned branch issue_408
are going to end up being merged? I'd love to see that.
With slightly longer path I'm still able to reproduce issue with issue_408
branch.
@penguinol's fork doesn't fix it either in that case.
Here is the log of it with @penguinol 's fork:
And here is log with issue_408
branch:
UPD: Workaround that works for me :trollface: :
await new Promise((resolve) => {
setTimeout(resolve, 1000);
});
await consumer.resume();
Hi @nazar-pc,
@jmillan do you think changes from mentioned branch issue_408 are going to end up being merged? I'd love to see that.
Branch issue_408 is a good attempt but it's not spec compliant and there's no guarantee that it fixes the problem (indeed you can reproduce the issue).
I think my changes are the way to go, at least by spec. I hardcoded a RTP Timestamp increment of 30pfs on the first RTP packet of the new spatial layer. I'll make this increment to be as high as possible in time (low as possible in fps) so it is applicable to scenarios with smaller fps.
Could you give it a try to issue_408 @penguinol, @nazar-pc? I've increased the RTP timestamp difference as if the stream was sent at 13fps. Hopefully any encoder will consider this a new frame and we'll get rid of this ugly effect.
Unfortunately,i can still reproduce the problem with issue_408 branch. And it's quite easy to reproduce. Skipping on sequece num works for me.
I believe this may related to this issue: https://bugs.chromium.org/p/webrtc/issues/detail?id=8423. There are chances that PacketBuffer::FindFrames may deliver incomplete H264 frames.
Set WebRTC-SpsPpsIdrIsH264Keyframe to true at receiver side should avoid this. Also, we let mediasoup hold the last packet of a frame(marker),then send it when almost other packets belongs to the same frame have been sent to receiver to reduce the probability of delivering incomplete H264 frames.
This is not H264-specific, I see similar behavior with VP8 both with GStreamer WebRTC implementation and Chromium.
And it's quite easy to reproduce.
Could you please indicate the environment to reproduce it?
I've been not able to reproduce it right now.
Skipping on sequece num works for me.
But still reproducible by @nazar-pc. The fact that this way of fixing is not standard compliant but implementation dependant makes me think that it may solve the issue depending on the decoder implementation we face, and I would not mark it as a final fix. But your findings are great and we may end up adding total or partially your fix. I just want to get to the root of the issue.
@penguinol PacketBuffer::FindFrames
// In the case of H264 we don't have a frame_begin bit (yes,
// |frame_begin| might be set to true but that is a lie). So instead
// we traverese backwards as long as we have a previous packet and
// the timestamp of that packet is the same as this one. This may cause
// the PacketBuffer to hand out incomplete frames.
if (is_h264 && (buffer_[start_index] == nullptr ||
buffer_[start_index]->timestamp != frame_timestamp)) {
break;
}
says it will traverese backwards as long as we have a previous packet and the timestamp of that packet is the same as this one. so in your case seq from 50717 to 50733 will be treated as a complete frame. @jmillan: Change TS does make sense.
PacketBuffer::FindFrames
So this explains why increasing the sec number works for this specific implementation.
Thanks for checking it out @unclerunning
I'm using meidasoup-demo with this option, nothing else changed. Consumer Chrome Version is 83.0.4103.106 Producer Chrome Version is 81.0.4044.138
Reproduce steps is as the gif shows https://github.com/penguinol/mediasoup/blob/408/doc/3.gif
routerOptions :
{
mediaCodecs :
[
{
kind : 'audio',
mimeType : 'audio/opus',
clockRate : 48000,
channels : 2
},
{
mimeType : 'video/h264',
clockRate : 90000,
parameters :
{
'packetization-mode' : 1,
'profile-level-id' : '4d0032',
'level-asymmetry-allowed' : 1,
'x-google-start-bitrate' : 1000
}
},
{
kind : 'video',
mimeType : 'video/h264',
clockRate : 90000,
parameters :
{
'packetization-mode' : 1,
'profile-level-id' : '42e01f',
'level-asymmetry-allowed' : 1,
'x-google-start-bitrate' : 1000
}
}
]
},
@penguinol PacketBuffer::FindFrames
// In the case of H264 we don't have a frame_begin bit (yes, // |frame_begin| might be set to true but that is a lie). So instead // we traverese backwards as long as we have a previous packet and // the timestamp of that packet is the same as this one. This may cause // the PacketBuffer to hand out incomplete frames. if (is_h264 && (buffer_[start_index] == nullptr || buffer_[start_index]->timestamp != frame_timestamp)) { break; }
says it will traverese backwards as long as we have a previous packet and the timestamp of that packet is the same as this one. so in your case seq from 50717 to 50733 will be treated as a complete frame. @jmillan: Change TS does make sense.
I think this may explain why skip on seq num works.
// If this is not a keyframe, make sure there are no gaps in the
// packet sequence numbers up until this point.
if (!is_h264_keyframe && missing_packets_.upper_bound(start_seq_num) !=
missing_packets_.begin()) {
uint16_t stop_index = (index + 1) % size_;
while (start_index != stop_index) {
sequence_buffer_[start_index].frame_created = false;
start_index = (start_index + 1) % size_;
}
return found_frames;
}
}
Increasing timestamp may help decoder differentiate two frames. But it does not tell the decoder the frame is not complete. So that's why i can still reproduce the issue after increase timestamp.
Skipping on seq num does not help decoder differentiate two frames.
So the decoder will regard the two frames as one frame, but it finds the frame is incomplete(see the code in prev replay), so it will not create a new frame object.
And later on, the decoder will recheck the packets, and recognize the i frame, because every time it loops from sequence_buffer_[seq_num % size_]
.
std::vector<std::unique_ptr<RtpFrameObject>> found_frames;
for (size_t i = 0; i < size_ && PotentialNewFrame(seq_num); ++i) {
size_t index = seq_num % size_;
sequence_buffer_[index].continuous = true;
This explains why skipping seq num without increaing timestamp can resolve the problem.
So maybe we should both increase timestamp(to differentiate two frames) and skip on seq num(to tell the decoder frame is incomplete).
This turns out to be quite frequent issue in production setting when switching layers often. Any chance to have at least temporary workaround that works in mediasoup until proper solution is found?
Indeed @nazar-pc,
A mix of both solutions (cseq and timestamp) will be a good compromise in the meantime. We'll apply it in the next days.
We've applied the changes related to the RTP timestamp here
The reason why we haven't applied the changes in the cseq number is because this is a problem specific to Chrome H264 decoder implementation, and such a workaround in mediasoup would generate a false packet loss in the statistics every time we switch to a different layer.
@penguinol, please lets comment on the chrome bug tracker, within this issue or in a new one and push a fix in the decoder side, since they are not being spec compliant on the frames detection.
and push a fix in the decoder side,
Here I just mean exposing the issue so they can fix it
Mentioned issue in libwebrtc is about H264, while I'm having issues with VP8. Now I'm forced to use fork with both solutions suggested here applied at the same time, which works fine.
Can you confirm that it also applies to VP8? Otherwise there is still an issue to be fixed in mediasoup.
The problem may be related to the packet buffer, that is agnostic (not 100% true) to the codec.
Can you please confirm whether with the cseq and timestamp fix we get a packet loss of one packet every time the spatial layer is changed? A main concern for avoiding such a solution is precisely that.
Hi, guys
I found that there are mosaics on video when switching layers, as follow:
I captured packet. It seems because we did not finish sending the last lower layer frame packets when the higher layer i-frame came. consumer packets:
producer packets:
In consumer packets, there is serial sequence number and the timestamp of two frame is the same. This might cause the mosics.