matrix-org / waterfall

A cascading stream forwarding unit for scalable, distributed voice and video conferencing over Matrix
Apache License 2.0
98 stars 5 forks source link

Find out why we have so many warnings on missing RTPs on certain active subscriptions #121

Closed daniel-abramov closed 1 year ago

daniel-abramov commented 1 year ago

A log line like this one is not uncommon during normal sessions:

waterfall  | WARN[2023-02-01T18:19:37Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=RWSECBPMWU user_id="@copper-sunny-butterfly:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=YSMNDCBZWH user_id="@copper-resident-stingray:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=WDEKNIAADZ user_id="@cyan-patient-junglefowl:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=NUOTIRTNCN user_id="@simon.brandner:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=YSMNDCBZWH user_id="@copper-resident-stingray:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=DTFWXQLZHN user_id="@crimson-premier-grouse:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=WDEKNIAADZ user_id="@cyan-patient-junglefowl:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:38Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=NUOTIRTNCN user_id="@simon.brandner:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:39Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=RWSECBPMWU user_id="@copper-sunny-butterfly:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=YSMNDCBZWH user_id="@copper-resident-stingray:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=WDEKNIAADZ user_id="@cyan-patient-junglefowl:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=NUOTIRTNCN user_id="@simon.brandner:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=YSMNDCBZWH user_id="@copper-resident-stingray:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=DTFWXQLZHN user_id="@crimson-premier-grouse:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=WDEKNIAADZ user_id="@cyan-patient-junglefowl:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:40Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=NUOTIRTNCN user_id="@simon.brandner:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:41Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=RWSECBPMWU user_id="@copper-sunny-butterfly:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:42Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:42Z] No RTP on subscription 6df7aea5-1bfa-45c6-91a6-87422782a541 (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=AZGGKGVAFT user_id="@robintown:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:42Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=YSMNDCBZWH user_id="@copper-resident-stingray:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:42Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=WDEKNIAADZ user_id="@cyan-patient-junglefowl:call.ems.host"
waterfall  | WARN[2023-02-01T18:19:42Z] No RTP on subscription 9db59e08-6390-44d5-a77a-11ced35eee2d (low)  conf_id=1675175890574TP04z8LAccuowlpZ device_id=NUOTIRTNCN user_id="@simon.brandner:call.ems.host"

It's not clear why there is an active subscription (with a low layer) that, despite being published, does not receive RTP frames with an intact publisher track. When the publisher track is gone, we remove all subscriptions associated with this track (check if we don't miss any edge cases).

It does not seem like this error has a big practical application (we did not observe any frozen subscription), yet it may point out to some bug in the SFU that must be resolved.

daniel-abramov commented 1 year ago

One of the reasons why this happens is if there is a video that is muted.

There are possible mitigations for this warning, see: https://github.com/matrix-org/waterfall/pull/137#issue-1596834950

daniel-abramov commented 1 year ago

After running it with the OpenTelemetry, so far it looks like these happened in two cases for video tracks only:

  1. Video was muted.
  2. The given layer [to which we subscribed] has been published, but no RTP packets have been received on a layer. Switching to a lower layer solves the issue.

@SimonBrandner, do you remember if the Element Call sends the metadata update requests whenever the video gets muted/unmuted? - If it does, we could detect when the track is muted on the SFU side not to trigger false positives for the case (1). If we do so, we can use (2) as a hint that no RTP packets are received and automatically switch the quality to the lower layer preserving the video.

UPD: We receive the update of the metadata when muting/unmuting.

daniel-abramov commented 1 year ago

After several sessions with OpenTelemetry it has become clear that these are caused by the browser not publishing higher levels (1) and videos being muted (2).

(2) was addressed by a PR. (1) can't be solved (Pion informs us about tracks, but there is no guarantee that we'll get RTP packets).

See https://github.com/matrix-org/waterfall/issues/131 for the related issue of missing RTP packets.