meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.23k stars 2.48k forks source link

Videoroom. Errors and Warnings breakdown. #156

Closed gatecrasher777 closed 9 years ago

gatecrasher777 commented 9 years ago

Things are running quite smoothly. However, running Janus videoroom plugin for 24 hours with typical use on level 4, generated 27341 lines in the janus.log

(run as: nohup ./janus > ./janus.log 2>&1 &)

The most frequent errors and warnings (with examples) were:

(A) 13719 lines [ERR][ice.c:janus_ice_cb_nice_recv:1018:] [4139658295] SRTP unprotect error: err_status_replay_old (len=943-->943, ts=935754679, seq=5389)

(B) 1668 lines [ERR][rtcp.c:janus_rtcp_nacks:640:] Buffer too small: 200 < 204 (at least 50 NACK blocks needed)

(C) 1106 lines [WARN][3621527529] Unsupported transport tcp:

(D) 936 lines (sequence of 3) [WARN][231525117] Data available but Data Channels support disabled... [231525117] ICE send thread leaving... No WebRTC media anymore

(E) 221 lines [WARN][1417479559] Didn't receive video for more than a second...

Comments:

(A) Over 50% of messages are related to the SRTP unprotect error. Is this a bug somewhere?

(B) Buffer too small error? That sounds like a bug, but maybe its expected if there is excessive packet loss.

(C) The unsupported transport tcp warning seems unnecessary, since tcp is not supported.

(D) If webrtc data is disabled, it looks like Janus is doing unnecessary work in setting up ICE threads to handle it. And even if it is necessary, the logging seems too verbose for level 4.

(E) Seems unnecessary at level 4.

So, not a specific issue, but it would be nice to see less errors and warnings at the default logging level when Janus is performing as expected, so that it easier to pick up specific issues that may be affecting the application. Although, also, an explanation of why I am seeing the top two errors would also be helpful.

PS: Happy belated first birthday to Janus Gateway. This is a really super project.

lminiero commented 9 years ago

Yep, logging could be pruned a bit. I already started doing that in recent revisions. About your points:

(A) I should definitely remove the LOG_ERR there, considering how much it's confusing you and others. I think I told this already but I can't remember where or when, so I'll reinstate it here. That's not an error per se. It just means the SRTP stack received the same packet again, and so rejects it. This most definitely happens after a NACK, e.g., 1) you miss a packet and send a NACK for it; 2) the packet was late and gets to you; 3) the browser gets the NACK and sends the packet again; 4) you get the packet again, and get an unprotect error. Many such errors in a row are in fact connected to multiple NACKs being sent because of some delays in the network.

(B) We limit the size of a NACK. 200 bytes means a LOT of NACKs, and it's more than enough. If you're getting there, packet loss is excessive so there's an issue. Check the new slow_link callback if you want to intercept that in the plugin and act on it (e.g., drop one of the videos the user is receiving).

(C) Already hidden behind a LOG_VERB instead of LOG_WARN in latest revisions. ICE-TCP is unsupported now but soon will be (as soon as libnice fixes it).

(D) Not sure where the line comes from. There's no additional ICE stuff involved. Chrome does BUNDLE and so everything travels on the same "connection", including data. You wouldn't get it on Firefox 35, for instance, and most definitely no SCTP setup is done. If you're getting that message the browser is negotiating data channels, so remove it there. We then should reject the application line in SDP, though, that's for sure, so I'll check why it's not happening.

(E) Well that's a relevant issue someone monitoring the console would want to know. I was unsure whether to make it an error or a warning, but I think a warning is fine. We might make it less frequent, but I think it's useful.

Thanks for the feedback and the wishes!

gatecrasher777 commented 9 years ago

With the slow link callback, could one calibrate a publisher's bitrate contraints in real time, i.e. reducing the bitrate when nack counts are too high, and reverting back to the "optimum" bitrate when the network improves? Its not clear to me, however, how I would intercept this is my application. Is it an onmessage event? And does it go to struggling publishers, struggling listeners, or both? Also I currently calibrate bitrates for publishers. would the similar bitrate contraints work for listeners too?

lminiero commented 9 years ago

In the plugin it's a new callback you have to implement. If you want your users to know there's a problem yes, you can send an ad-hoc event from your plugin that would be received as other plugin-generated messages on the client side.

You cannot configure any bitrate constraint on listeners, they're just receiving something someone else is sending.

lminiero commented 9 years ago

What you can do though is intercepting the REMB listeners send: if it's much lower than the publisher bitrate, they won't be able to watch this unless you tune down the publisher's bitrate or, in case the user is watching several streams, tearing some of them down to optimize the bandwidth. Forcing the publisher to always send at a bitrate that satisfies the worst listener is often a bad idea, quality wise. Simulcasting (a publisher sending multiple versions of the same stream) can be an alternative too, with listeners switching to the lower res version in case of issues.

gatecrasher777 commented 9 years ago

Okay, so to implement a slow_link notification to my app, I would have to init

.slow_link = janus_videoroom_slow_link

in videoroom.c and then flesh out

void janus_videoroom_slow_link(janus_plugin_session *handle, int video) {}

to send a videoroom event to the publisher. Hmm, okay. I think I could do that. I would just be concerned about videoroom updates after that, unless it got merged.

lminiero commented 9 years ago

Closing as not strictly an issue? Feel free to reopen if you think this discussion should be kept alive for any reason.