conference creation fails under load due to port contention

jitsi / jitsi-videobridge

Jitsi Videobridge is a WebRTC compatible video router or SFU that lets build highly scalable video conferencing infrastructure (i.e., up to hundreds of conferences per server).

https://jitsi.org/jitsi-videobridge

Apache License 2.0

2.91k stars 992 forks source link

conference creation fails under load due to port contention #96

Closed jmquigs closed 7 years ago

jmquigs commented 9 years ago

Hello,

IceUdpTransportManager attempts to bind ports from a "tracked" starting value, in code that starts here:

https://github.com/jitsi/jitsi-videobridge/blob/master/src/main/java/org/jitsi/videobridge/IceUdpTransportManager.java#L724

The problem is that if multiple threads enter this code in parallel, all can start the search from the same value. Under sustained conference creation load, this ultimately leads to a bind failure, because there is an inner retry loop that fails after 50 attempts (it can be bumped to 100, but that 100 limit is hardcoded in NetworkAddressManagerServiceImpl.createIceStream). If the retry loop is unsuccessful, conference creation fails.

I can get a failure with a conference creation rate of 9/sec over 10 seconds on an EC2 m4.large instance.

Are there any plans to improve this port assignment code? We are experimenting with a synchronized block around the portTracker code mentioned above. That eliminates the errors, but results in a port assignment time that is a order of magnitude slower under load. We may consider using multiple port trackers or some kind of port free list, but we wanted to see if anyone else had run into this.

bgrozev commented 9 years ago

Not sure about a proper long-term solution, but a workaround is to use the single-port mode, in which case the portBase value will be (effectively) not used.

jmquigs commented 9 years ago

Thanks @bgrozev, we'll check out this mode.

fippo commented 9 years ago

@bgrozev the last time I looked the single port harvester just added this additional port. Is there a way to disable anything but that port?

bgrozev commented 9 years ago

@fippo that's pretty much the default behavior actually. The harvester itself only adds a port, but there is logic in the bridge which disables the dynamically allocated candidates if single-port is in use. This way we only use the single port for browsers, but use dynamic ports for endpoints without rtcpmux support (i.e. jigasi).

fippo commented 9 years ago

@bgrozev hah, that looks much better than what I remembered. Thanks!

bgrozev commented 8 years ago

https://github.com/jitsi/jitsi-videobridge/commit/d6610fd6f11ef89f6cf738c8d7ea628d8bd794cb provides a fix for a related issue (which may have actually cause what you observed). The TCP and/or "single-port" port were used when updating the value, which resulted in it always staying at "minPort". The race condition is still there, but I believe it is very unlikely to cause any problems in practice.

bgrozev commented 7 years ago

Closing because no further problems have been reported for over a year. Please reopen if necessary.

jmquigs commented 7 years ago

We moved to single port mode, which has worked mostly fine for us and doesn't have this port-search issue.

We did have to increase the socket OS receive buffer sizes for the single port harvester. With the default limits, we were saturating these buffers and this led to audio drop outs. The settings we now use are as follows: sysctl -w net.core.rmem_default=20971520 sysctl -w net.core.rmem_max=33554432 sysctl -w net.core.wmem_default=65536 sysctl -w net.core.wmem_max=33554432

bgrozev commented 7 years ago

On Thu, Mar 30, 2017 at 11:01 AM John Quigley notifications@github.com wrote:

We moved to single port mode, which has worked mostly fine for us and doesn't have this port-search issue.

We did have to increase the socket OS receive buffer sizes for the single port harvester. With the default limits, we were saturating these buffers and this led to audio drop outs. The settings we now use are as follows: sysctl -w net.core.rmem_default=20971520

Note that you may want to keep the system-wide default lower. We have a java property you can use to control the receive buffer size only for the single-port mode:

org.ice4j.ice.harvest.AbstractUdpListener.SO_RCVBUF

Boris

sysctl -w net.core.rmem_max=33554432 sysctl -w net.core.wmem_default=65536 sysctl -w net.core.wmem_max=33554432

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290457863, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHQu-Rm4vSFZrG8dvTY0dSEMPLWWT1Wks5rq9HagaJpZM4Ge7Jp .

jmquigs commented 7 years ago

Thanks, thats a good tip. Perhaps it could be added to the official documentation, here: https://github.com/jitsi/jitsi-videobridge/blob/master/doc/single-port.md I'm not sure what the recommended value would be. We chose 20MB based on empirical tests with bots and how much memory our systems had available.

We've also experimented with a custom modification where we have multiple single port harvesters (i.e one per core), on different ports. The idea being that each SPH can use a smaller buffer size and gets its own IO input thread. However, we've never observed any benefit from that, except for perhaps a small improvement in ice candidate selection (less TCP and more direct and TURN-UDP in an videobridge under load.)