m1k1o / neko

A self hosted virtual browser that runs in docker and uses WebRTC.
https://neko.m1k1o.net/
Apache License 2.0
5.94k stars 449 forks source link

NVidia XFCE client reconnect loop due to peer data channel closed #386

Closed XHawk87 closed 2 months ago

XHawk87 commented 2 months ago

I've been running a self-build XFCE server on NVidia for many months now, and today after updating to the latest master branch commit, I have a new error occurring.

It appears to connect fine, I get sound and video, the desktop loads, I can interact with it, however every second or two it reconnects with the message "peer data channel closed".

I get this error on the client console:

[NEKO] DBG disconnected: Error: peer data channel closed
    at pe.createPeer (app.d8e473a4.js:1:13271)
    at pe.onMessage (app.d8e473a4.js:1:14364)

After turning on debug mode for the server, I get these logs:

Apr 11 02:35:47: 1:35AM DBG session connected id=QxENmdqzmH23DQnRLVMyq-Omoz9vtROm module=websocket
Apr 11 02:35:49: 2024-04-11 01:35:49,632 DEBG 'neko' stdout output:
Apr 11 02:35:49: 1:35AM DBG read message error error="websocket: close 1006 (abnormal closure): unexpected EOF" module=websocket
Apr 11 02:35:49: 1:35AM DBG handle socket ending address=172.21.0.2:56680 module=websocket
Apr 11 02:35:49: 1:35AM DBG session ended address=172.21.0.2:56680 module=websocket session=QxENmdqzmH23DQnRLVMyq-Omoz9vtROm
Apr 11 02:35:49: 1:35AM DBG request complete (0) module=http req={"agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36","id":"1b5bc6995897/decw0YE1SS-000165","method":"GET","proto":"HTTP/1.1","remote":"172.21.0.2:56680","scheme":"http","uri":"http://neko.local/ws?password=REDACTED"} res={"bytes":0,"elapsed":2138.268534,"status":0,"time":"Thu, 11 Apr 2024 01:35:49 UTC"}
Apr 11 02:35:49: 1:35AM WRN Failed to accept RTP stream is already closed module=webrtc submodule=pion subsystem=pc
Apr 11 02:35:49: 1:35AM WRN Failed to accept RTCP stream is already closed module=webrtc submodule=pion subsystem=pc
Apr 11 02:35:49: 1:35AM WRN Failed to discover mDNS candidate ee8005f0-70e7-4ac8-9e0b-a050b291d8de.local: mDNS: connection is closed module=webrtc submodule=pion subsystem=ice
Apr 11 02:35:49: 1:35AM INF Setting new connection state: Closed module=webrtc submodule=pion subsystem=ice
Apr 11 02:35:49: 1:35AM INF peer connection state changed: closed module=webrtc submodule=pion subsystem=pc
Apr 11 02:35:49: 2024-04-11 01:35:49,632 DEBG 'neko' stdout output:
Apr 11 02:35:49: 1:35AM INF ICE connection state changed: closed module=webrtc submodule=pion subsystem=pc
Apr 11 02:35:49: 1:35AM INF connection state has changed connection_state=closed module=webrtc
Apr 11 02:35:49: 1:35AM INF peer closed id=QxENmdqzmH23DQnRLVMyq-Omoz9vtROm module=webrtc

I've tried this on the latest release Firefox and Chromium browsers with the same result.

Any idea why this is happening?

XHawk87 commented 2 months ago

After a lot more experimenting, it appears that the issue is caused by the Traefik reverse-proxy. It works fine if I remove all of the Traefik labels and the neko docker network from startup and publish a port instead.

I use the following labels on the neko docker service:

traefik.http.routers.neko-local.entryPoints=https
traefik.http.routers.neko-local.rule=Host(`neko.local`) && PathPrefix(`/`)
traefik.http.routers.neko-local.service=neko

traefik.http.routers.neko-public.entryPoints=neko
traefik.http.routers.neko-public.rule=PathPrefix(`/`)
traefik.http.routers.neko-public.service=neko

traefik.http.services.neko.loadbalancer.server.port=8080

With this in the traefik.toml:

...
entrypoints.https.address=':443'
entrypoints.https.http.tls=true

entrypoints.neko.address=':8444'
entrypoints.neko.http.tls=true
...

Trying to figure out how to configure it so that it doesn't disrupt the peer data channel. Though not entirely sure how it's doing it, or why it just started now.

XHawk87 commented 2 months ago

The interesting thing is that Traefik isn't preventing WSS access. That works fine, I get messages back and forth. The problem is with the peer data connection over the EPR on UDP. It opens the connection, I see a ton of data going through on EPR ports to the server in wireshark, and then the connection just closes itself after roughly 2 seconds. The UDP ports don't even go through Traefik, they're published by docker and open in the router. So how is using Traefik making the difference?

XHawk87 commented 2 months ago

I don't think I actually changed anything since my last test, but it appears to have spontaneously fixed itself, for now. :eyes:

Glad it is working now, but I don't like not understanding why it stopped working in the first place and then started working again. I'll close the issue for now.