strukturag / nextcloud-spreed-signaling

Standalone signaling server for Nextcloud Talk.
GNU Affero General Public License v3.0
364 stars 71 forks source link

[Help Request] Performance Issues #199

Closed pyte1 closed 2 years ago

pyte1 commented 2 years ago

Hi,

i already setup my 2nd High Performance Backend. However it seems like there are some issues regarding the performance of the Backend.

In a test-videocall with 12 people some videosstreams seem to hang for a few seconds, audio and video is delayed and sometimes the quality of the streams get really bad(pixely). All of the participants are using a Windows 10 clients with different browsers on modern hardware. The participants are all located at the same building.

Overview: High Performance Server (ESX VM): Ubuntu 20.04.3 LTS, 8 CPU Cores, 16GB RAM, Gigabit-Link Nextcloud Instance (ESX VM): Debian 11, 8 CPU Cores, 8GB RAM, Gigabit-Link

nextcloud-spreed-signaling version: v0.4.1 janus version: janus 0.7.3 coturn version: 4.5.1.1-1.1ubuntu0.20.04.2

These are the configuration files of janus, coturn and singnaling:

signaling config:

listen = 127.0.0.1:8080

[https]
certificate = /etc/nginx/ssl/server.crt
key = /etc/nginx/ssl/server.key

[app]
debug = false

[sessions]
hashkey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
blockkey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[clients]
internalsecret = the-shared-secret-for-internal-clients

[backend]
backends = ncvsngnc
allowall = false
secret = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
timeout = 10
connectionsperhost = 8

[ncvsngnc]
url = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
secret = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[nats]
url = nats://localhost:4222

[mcu]
type = janus
url = ws://127.0.0.1:8188

[turn]
apikey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
secret = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
servers = turn:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:3478?transport=udp,turn:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:3478?transport=tcp

[geoip]

[geoip-overrides]

[continent-overrides]

[stats]
allowed_ips = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

coturn config:

tls-listening-port=5349
fingerprint
use-auth-secret
cli-password=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
static-auth-secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
realm=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
cert=/etc/letsencrypt/rsa-certs/fullchain.pem
pkey=/etc/letsencrypt/rsa-certs/privkey.pem
cipher-list="ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384"
no-multicast-peers
dh-file=/etc/turnserver/dhp.pem

relevant part of janus.jcfg:

        stun_server = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
        stun_port = 5349 
        nice_debug = true 
        full_trickle = true

I check the logs for errors. The only suspicious thing that i could find, and there are a ton of these messages is: (XXXXXXXXXXXXXXXX) is reporting 12 lost packets on the uplink (Janus -> client)

Is there something wrong with my configuration? Anyone with the same issues? I need some hints, as i am struggling to get this thing to work properly. If this is not the right place to ask this please direct me to a place where i can get support.

Thank you in advance.

fancycode commented 2 years ago

This sounds very similar to https://github.com/nextcloud/spreed/issues/6804

Janus 0.7.3 is rather old (July 2019), so the problems might be resolved by updating it (see https://github.com/nextcloud/spreed/issues/6804#issuecomment-1017332470).

pyte1 commented 2 years ago

Janus 0.7.3 is rather old (July 2019), so the problems might be resolved by updating it (see nextcloud/spreed#6804 (comment)).

I will update to a newer Version and report back if the issues are gone or if there is any difference in performance. Thank you!

pyte1 commented 2 years ago

I've upgraded janus to janus 0.11.6. The first testcall with 2 users seems fine. We planned a testcall with 10+ users for next week. I will report back, if the issue is still open.

pyte1 commented 2 years ago

After the upgrade to version 0.11.6 of janus, we've made a videocall with 7 participants today with no issues at all. The log entries regarding packet loss are gone, not a single one occured within the 45min long call. It seems to fix the issue for good. The next call with 15 participants will be next wednesday, should the situation change with more participants i will report back.

closed for now. And Thank you again @fancycode ! 👍