Performance regression in RTPConnectorOutputStream

jmquigs commented 8 years ago

Hello

We have be tracking down an issue with the jitsi videobridge that causes increase CPU utilization as well as increased packet loss, especially in audio streaming. We are trying to fix a problem related to audio drop-outs/PLC in our production environment, but we haven't yet determined if this regression is implicated. We may stage a production release sometime in the next two weeks with USE_SEND_THREAD disabled to test it.

We have isolated the issue to this commit from Feb 3: https://github.com/jitsi/libjitsi/commit/73f20dc7f35b19914d8c7181d50826a6e404be08

The issue appears to stem from the use of multiple threads to simulate nonblocking network IO. With use USE_SEND_THREAD enabled (the default), we see higher CPU/packet loss than with the setting disabled.

Test configuration 1:

Jitsi Hammer with 100 fake users in a single room, using big buck bunny video and default silence stream for audio, test runs for 300 seconds.
4core, 8GB VM-based videobridge server (redhat 7)
The issue reproduces with videobridge HEAD builds as well as builds from Feb 3 onward.

USE_SEND_THREAD enabled

~70% average CPU usage, spikes to 80%+
Both hammer and the videobridge report some dropped packets in the log (message: "WARNING: Dropped 1 packets hashCode=1490171397)"

USE_SEND_THREAD disabled

~50% average CPU usage, spikes to 70%+
No dropped packets reported
Test configuration 2:
Jitsi Hammer: 10 instances with 10 users each in 10 rooms (100 users total), using badger video/audio
1 Chrome instance in each room to measure packet loss via chrome://webrtc-internals
4core, 8GB VM-based videobridge server (centos 7)

USE_SEND_THREAD enabled

Chrome reports: Audio median 12.7% packet loss, Video median 3.9% packet loss
Cpu saturated (95%+)

USE_SEND_THREAD disabled

Chrome reports: Audio median 6.7% packet loss, Video median 2.6% packet loss
Reduced CPU usage
sip configuration

net.java.sip.communicator.SC_HOME_DIR_LOCATION=/home/ec2-user
net.java.sip.communicator.SC_HOME_DIR_NAME=.sip-communicator
net.java.sip.communicator.packetlogging.PACKET_LOGGING_ENABLED=false
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_TRANSPORT=colibri

# this is set to true or false as required for test
org.jitsi.impl.neomedia.RTPConnectorOutputStream.USE_SEND_THREAD = true

bgrozev commented 8 years ago

Hello,

The issue appears to stem from the use of multiple threads to simulate nonblocking network IO. With use USE_SEND_THREAD enabled (the default), we see higher CPU/packet loss than with the setting disabled.

The higher CPU utilization is somewhat expected, due to the introduction of new threads. The reason for this change in the first place is that if one receiver is connected via TCP, we would previously block the RTPTranslator thread, thus affecting the whole conference.

We also measured slightly higher CPU usage with USE_SEND_THREAD. We feared that this might increase the jitter, but if anything it was decreased.

We are trying to fix a problem related to audio drop-outs/PLC in our production environment, but we haven't yet determined if this regression is implicated.

I would be surprised if this has an effect on audio, since the bitrates as so much lower than video.

USE_SEND_THREAD enabled

~70% average CPU usage, spikes to 80%+
Both hammer and the videobridge report some dropped packets in the log (message: "WARNING: Dropped 1 packets hashCode=1490171397)"

This is expected to happen at some point as the load increases. It may happen earlier with USE_SEND_THREAD.

USE_SEND_THREAD enabled

Chrome reports: Audio median 12.7% packet loss, Video median 3.9% packet loss
Cpu saturated (95%+)

Is that 12.7% packet loss due to the network, or packets discarded on the bridge?

The difference between audio and video is strange, could it be due to video packets being restored with FEC?

Is chrome's cpu saturated or the bridge's?

Since the only use for USE_SEND_THREAD is for receivers connected via TCP, it may be worth enabling it dynamically only in this case. A contribution would be welcome. If you don't care about TCP, you can safely use USE_SEND_THREAD=false.

Regards, Boris

jmquigs commented 8 years ago

I would be surprised if this has an effect on audio, since the bitrates as so much lower than video.

Yes, one thing we are also investigating is whether our audio bitrates are higher than they should be in our production application.

Is that 12.7% packet loss due to the network, or packets discarded on the bridge?

I am not sure; it is the loss reported in the chrome internals tool. I am not sure exactly how to measure packets discarded on the bridge, is there documentation on this?

The difference between audio and video is strange, could it be due to video packets being restored with FEC?

Again, not sure on this, we don't have a good way of measuring this currently.

Is chrome's cpu saturated or the bridge's?

The bridge CPU is saturated.

In fact, in followup testing we have observed that we only get significant loss when the CPU approaches saturation. So the packet loss may be entirely explained by that.

Further, while we do get increased cpu with the send thread enabled, we also have significantly more outgoing bandwidth used, which is probably desirable (doing more work, but getting more done). Subjectively, when visiting the room with a chrome instance, the hammer bot videos seem to flow more smoothly in the send-thread=true case. (Note, I tested this with a room with only 25 bots in it, since 100 bots in a room is a bit insane at least for video).

Given these findings I am now questioning whether this really is a regression and whether its implicated in our audio issues. If anything, using the send thread seems like it would help as long as our production CPU is not saturated (and according to our measurements, it is nowhere near that).

Since the only use for USE_SEND_THREAD is for receivers connected via TCP, it may be worth enabling it dynamically only in this case. A contribution would be welcome. If you don't care about TCP, you can safely use USE_SEND_THREAD=false.

I am a bit confused by this, since as I mentioned, I see an improvement in throughput with send thread enabled and I don't believe I have any TCP receivers (I don't think hammer is using them and my chrome instance is not connected for most of the test). In other words, it seems like the send thread is valuable beyond just TCP receivers.

jmquigs commented 8 years ago

I should mention that our production jitsi videobridge build is rather old (about a year). We have merged in the upstream changes to April to try to get up to date, and that is when we started having these issues and had to revert to our old build in production.

bgrozev commented 8 years ago

I am not sure; it is the loss reported in the chrome internals tool. I am not sure exactly how to measure packets discarded on the bridge, is there documentation on this?

You can look for the "Dropped XXX packets" logs that you already observed. I believe we now have such logs in every place where the bridge can potentially discard packets.

Again, not sure on this, we don't have a good way of measuring this currently.

You could look at a pcap capture from a client in wireshark.

The bridge CPU is saturated.

OK. This seems to explain it.

I am a bit confused by this, since as I mentioned, I see an improvement in throughput with send thread enabled and I don't believe I have any TCP receivers (I don't think hammer is using them and my chrome instance is not connected for most of the test). In other words, it seems like the send thread is valuable beyond just TCP receivers.

This is interesting, I would expect the extra copy and overhead multiple of threads to have a negative impact. It could be that in very big conferences the bottleneck is in the single thread in RTPTranslator, which has less work to do with USE_SEND_THREAD, so it drops less.

Further, while we do get increased cpu with the send thread enabled, we also have significantly more outgoing bandwidth used, which is probably desirable (doing more work, but getting more done).

I don't understand this. Since the hammers don't adapt their rate, more bandwidth should result in less loss.

jmquigs commented 8 years ago

I don't understand this. Since the hammers don't adapt their rate, more bandwidth should result in less loss.

I meant that in the case where the CPU is not saturated (i.e. low loss) using the send thread results in more data transmitted out from the video bridge, which corresponds to the increased video smoothness of the 25 hammer bot videos when viewed by chrome. In fact, without the send thread, the video is very stuttery.

Anyway, I suppose this issue can be closed, since the higher cpu usage may in fact just be a result of blocking less, and the packet loss is just an result of saturation.

jitsi / libjitsi

Performance regression in RTPConnectorOutputStream #178

Test configuration 1:

Test configuration 2:

sip configuration