jitsi / jitsi-videobridge

Jitsi Videobridge is a WebRTC compatible video router or SFU that lets build highly scalable video conferencing infrastructure (i.e., up to hundreds of conferences per server).
https://jitsi.org/jitsi-videobridge
Apache License 2.0
2.91k stars 992 forks source link

HTTPS server performance regression #963

Open spditner opened 5 years ago

spditner commented 5 years ago

Description

An HTTPS server performance issue seems to have happened in the stable release of jitsi-videobridge between release 1109-1 and 1116-1 (as well as latest stable 1124) where HTTPS requests went from being handled in < 1s the majority of the time with some taking 15s, to the majority taking 15s and very few being <1s.

I see that there was a switch to a more recent jetty release around that time, switching back to the older jetty restores the previous mostly-okay behaviour.

The 15 seconds is very consistent, it would seem to indicate a timeout somewhere.

Current behavior

Response times of 15,000ms

Expected Behavior

Response times 100-300ms range

Possible Solution

Revert to jitsi-videobridge=1109-1 restores the mostly-okay behaviour, as does using the previous version of Jetty (9.2.10.v20150310). The latest Jetty did not improve the situation.

Steps to reproduce

root@ubuntu-bionic:~# while [ 1 ]; do time curl -I -s -k https://localhost; sleep 1; done
HTTP/1.1 200 OK
Date: Mon, 04 Nov 2019 16:35:56 GMT
Last-Modified: Thu, 09 May 2019 15:32:14 GMT
Content-Type: text/html
Accept-Ranges: bytes
Content-Length: 30841
Server: Jetty(9.4.15.v20190215)

real    0m15.047s
user    0m0.012s
sys 0m0.006s

...

Response time should be < 1s, not 15s+

Environment details

Ubuntu Bionic

damencho commented 5 years ago

Thanks for the heads up. That is an interesting observation, we will take a look when time permits.

We are planning to remove that from the default deployment of jitsi-meet and use nginx by default for serving the web. This is on the roadmap, but no ETA for the moment. Anyway, the problem is interesting, do you see the same behavior if you are querying for health or stats? It is maybe that the filesystem IO is somehow slower in these new versions ... just guessing here, but if the stats and health are also affected means a bug which is very critical.

spditner commented 5 years ago

@damencho Thanks for following up; I came up with a more critical test scenario this morning. The bridge goes down from the user perspective, while the health port responds that the bridge is healthy.

Steps:

Start two jobs in separate windows, one polling the HTTPS service, and the other watching the health port:

while [ 1 ]; do time curl -I -s -k https://localhost ; sleep 0.5; done
while [ 1 ]; do curl -s -w '%{http_code}\n' http://localhost:8080/about/health; sleep 1; done

Let those run for ~1-2 minutes, and the HTTPS request should jam -- Then attempt to join the bridge from a browser. The HTTPS service will not respond to the browser, while the health port continues to report 200 OK.

It would appear that however HTTPS requests are being processed is now blocked, and other symptoms start to show up like an accumulation of CLOSE_WAIT's building up over time which can be seen by doing an lsof -i -n -P | grep jvb.

damencho commented 5 years ago

But if you skip curl requests to https://localhost, do you see any anomalies with just querying the health? For the https://localhost you can just switch to using nginx.

spditner commented 5 years ago

I do not see an issue in that style of deployment. In my case however, I'm using the demultiplexing feature on port 443 in jitsi-videobridge as in the default jitsi-meet deployment when nginx/apache are not installed. Some of the connecting clients are in unfriendly networks that are only allowing port 443.

Is that feature going away with v2 of the videobridge such that I might need more IP addresses to split up the services, or something like sslh to demultiplex?

damencho commented 5 years ago

TCP support in jvb is not optimal, so the recommended approach is using a separate deployment of turn server with valid ssl certificate listening on port 443.

The currently supported configuration is not going away, upgrade from jvb to jvb2 should be without any modifications. The implementation of jvb2 for multiplexing is the same and will not change, I think.

Our idea was to make nginx default when installing and even configure turn server on the same machine and multiplex in nginx forwarding to turn or serving web, but it turned out chrome does not set alpn on the turn connection. https://bugs.chromium.org/p/chromium/issues/detail?id=1014904 So we plan to maybe do it in the beginning as a separate virtual host in nginx, so a second DNS will be required and this will be handled by let's encrypt scripts ... But these are plans and we are still experimenting with it.