l7mp / stunner

A Kubernetes media gateway for WebRTC. Contact: info@l7mp.io
https://l7mp.io
MIT License
742 stars 57 forks source link

How debug problem? #99

Closed krajcikondra closed 9 months ago

krajcikondra commented 1 year ago

Hello,

I have running udp-gateway (LoadBalancer) and running stunner pod.

$ kubectl get service -n stunner
NAME          TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                         AGE
stunner       ClusterIP      10.245.242.16   <none>           3478/UDP                        2d1h
udp-gateway   LoadBalancer   10.245.125.7    138.68.119.xx   3478:32224/UDP,8086:32360/TCP   2d1h
$ kubectl get pod -n stunner
NAME                       READY   STATUS    RESTARTS   AGE
stunner-7ff4875b47-xb4z9   2/2     Running   0          102m
$ kubectl get pod -n stunner
NAME                       READY   STATUS    RESTARTS   AGE
stunner-7ff4875b47-xb4z9   2/2     Running   0          102m

I created DNS record

A  stunner.mywebpage.com  138.68.119.xx

Now I try connection on pagehttps://icetest.info/

Result of my test is

image

It looks my sturn/turn server is not working.

How can I find out reason of my problem?

krajcikondra commented 1 year ago

I tried Testing section

When I run command

$ ./turncat - k8s://stunner/stunnerd-config:udp-listener udp://${PEER_IP}:9001
15:47:26.075136 turncat.go:561: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for N6d+9RWMomGaxYvZ

After 3 minutes script still running but no other output.

rg0now commented 1 year ago

Whenever we face such a problem we usually try the simple tunnel example first: this is a simple setup that exercises the entire STUNner control plane and data plane. Could you please restart with a clear cluster, configure the simple-tunnel example, and report back your findings? If this works then your STUNner installation is at least OK, and then we can go on and try to find your specific problem.

krajcikondra commented 1 year ago

Hi,

I reinstall stunner. Full reinstall script is here: https://pastebin.com/eGQAwUzY

I tried simple tunnel. Problem output are marked bold.

$ kubectl apply -f docs/examples/simple-tunnel/iperf-server.yaml
deployment.apps/iperf-server created
service/iperf-server created
$ kubectl get service iperf-server  -o wide
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE   SELECTOR
iperf-server   ClusterIP   10.245.254.107   <none>        5001/UDP,5001/TCP   24s   app=iperf-server

kubectl apply -f docs/examples/simple-tunnel/iperf-stunner.yaml

$ kubectl get gatewayconfigs,gateways,udproutes -n stunner 
NAME                                                  REALM             AUTH        AGE
gatewayconfig.stunner.l7mp.io/stunner-gatewayconfig   stunner.l7mp.io   plaintext   6m16s

NAME                                            CLASS                  ADDRESS   PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/tcp-gateway   stunner-gatewayclass             True         18s
gateway.gateway.networking.k8s.io/udp-gateway   stunner-gatewayclass             True         6m15s

NAME                                              AGE
udproute.gateway.networking.k8s.io/iperf-server   17s
udproute.gateway.networking.k8s.io/media-plane    6m14s
$ cmd/stunnerctl/stunnerctl running-config stunner/stunnerd-config
STUN/TURN authentication type:  static
STUN/TURN username:     user-1
STUN/TURN password:     pass-1
Listener 1
    Name:   stunner/udp-gateway/udp-listener
    Listener:   stunner/udp-gateway/udp-listener
    Protocol:   UDP
    Public address: 159.89.251.24
    Public port:    3478
Listener 2
    Name:   stunner/tcp-gateway/tcp-listener
    Listener:   stunner/tcp-gateway/tcp-listener
    Protocol:   TCP
    Public address: 159.223.246.6
    Public port:    3478
$ kubectl get -n stunner services
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
stunner       ClusterIP      10.245.230.162   <none>          3478/UDP         17m
tcp-gateway   LoadBalancer   10.245.11.14     159.223.246.6   3478:31369/TCP   11m
udp-gateway   LoadBalancer   10.245.153.98    159.89.251.24   3478:32009/UDP   17m

Run the benchmark

export IPERF_ADDR=$(kubectl get svc iperf-server -o jsonpath="{.spec.clusterIP}")
./turncat --log=all:INFO udp://127.0.0.1:5000 k8s://stunner/stunnerd-config:udp-listener \
     udp://$IPERF_ADDR:5001

08:24:10.792921 turncat.go:176: turncat INFO: Turncat client listening on udp://127.0.0.1:5000, TURN server: udp://159.89.251.24:3478, peer: udp://10.245.254.107:5001
08:24:27.461703 turncat.go:453: turncat **WARNING**: relay setup failed for client udp:127.0.0.1:41777, dropping client connection
08:24:35.269909 turncat.go:453: turncat **WARNING**: relay setup failed for client udp:127.0.0.1:41777, dropping client connection

There are some warnings

$ iperf -c localhost -p 5000 -u -i 1 -l 100 -b 800000 -t 10
------------------------------------------------------------
Client connecting to localhost, UDP port 5000
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 41777 connected with 127.0.0.1 port 5000
[ ID] Interval       Transfer     Bandwidth
[  3] 0.0000-1.0000 sec  97.7 KBytes   801 Kbits/sec
[  3] 1.0000-2.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 2.0000-3.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 3.0000-4.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 4.0000-5.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 5.0000-6.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 6.0000-7.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 7.0000-8.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 8.0000-9.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 9.0000-10.0000 sec  97.7 KBytes   800 Kbits/sec
[  3] 0.0000-10.0021 sec   977 KBytes   800 Kbits/sec
[  3] Sent 10005 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.

There is some warning

$ kubectl logs $(kubectl get pods -l app=iperf-server -o jsonpath='{.items[0].metadata.name}')
------------------------------------------------------------
Server listening on UDP port 5001 with pid 1
Read buffer size: 1.44 KByte (Dist bin width= 183 Byte)
UDP buffer size:  208 KByte (default)

HERE IS MISSING TABLE WITH iNTERVAL, Transfer, BANDWIDTH

krajcikondra commented 1 year ago

Now I checked administration in Digital Ocean and I see one of my loadbalancer is down and there is some new load balancer. I am not sure if is it correct.

image

rg0now commented 1 year ago

Thanks! Can you please also post the logs from stunnerd? We need to see whether the iperf packets have made it to stunnerd: if yes, then the problem is on the cluster-side, if not, then this is a DO load-balancer issue.

krajcikondra commented 1 year ago

I restarted my stunnerd to clear old logs and rerun benchmark. I found following in stunnerd logs.

│ config-watcher {"time": "2023-08-25T06:10:33.599569+00:00", "msg": "Starting collector", "level": "INFO"}                                                                                          │
│ config-watcher {"time": "2023-08-25T06:10:33.600055+00:00", "msg": "No folder annotation was provided, defaulting to k8s-sidecar-target-directory", "level": "WARNING"}                            │
│ config-watcher {"time": "2023-08-25T06:10:33.600737+00:00", "msg": "Loading incluster config ...", "level": "INFO"}                                                                                │
│ config-watcher {"time": "2023-08-25T06:10:33.602720+00:00", "msg": "Config for cluster api at 'https://10.245.0.1:443' loaded...", "level": "INFO"}                                                │
│ config-watcher {"time": "2023-08-25T06:10:33.603056+00:00", "msg": "Unique filenames will not be enforced.", "level": "INFO"}                                                                      │
│ config-watcher {"time": "2023-08-25T06:10:33.603360+00:00", "msg": "5xx response content will not be enabled.", "level": "INFO"}                                                                   │
│ config-watcher {"time": "2023-08-25T06:10:38.697053+00:00", "msg": "Writing /etc/stunnerd/stunnerd.conf (ascii)", "level": "INFO"}                                                                 │
│ stunnerd 06:10:23.888600 main.go:82: stunnerd INFO: watching configuration file at "/etc/stunnerd/stunnerd.conf"                                                                                   │
│ stunnerd 06:10:23.889170 reconcile.go:113: stunner INFO: setting loglevel to "all:INFO"                                                                                                            │
│ stunnerd 06:10:23.889995 reconcile.go:141: stunner WARNING: running with no listeners                                                                                                              │
│ stunnerd 06:10:23.890028 reconcile.go:157: stunner WARNING: running with no clusters: all traffic will be dropped                                                                                  │
│ stunnerd 06:10:23.890039 reconcile.go:177: stunner INFO: reconciliation ready: new objects: 2, changed objects: 0, deleted objects: 0, started objects: 0, restarted objects: 0                    │
│ stunnerd 06:10:23.890086 reconcile.go:181: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: plaintext, listeners: NONE, active allocations: 0                                  │
│ stunnerd 06:10:30.890478 config.go:283: watch-config WARNING: waiting for config file "/etc/stunnerd/stunnerd.conf"                                                                                │
│ stunnerd 06:10:38.893431 reconcile.go:113: stunner INFO: setting loglevel to "all:INFO"                                                                                                            │
│ stunnerd 06:10:38.894083 server.go:19: stunner INFO: listener stunner/udp-gateway/udp-listener: [udp://10.244.4.74:3478<32768:65535>] (re)starting                                                 │
│ stunnerd 06:10:38.894121 server.go:42: stunner INFO: setting up UDP listener socket pool at 10.244.4.74:3478 with 16 readloop threads                                                              │
│ stunnerd 06:10:38.895090 server.go:161: stunner INFO: listener stunner/udp-gateway/udp-listener: TURN server running                                                                               │
│ stunnerd 06:10:38.895124 server.go:19: stunner INFO: listener stunner/tcp-gateway/tcp-listener: [tcp://10.244.4.74:3478<32768:65535>] (re)starting                                                 │
│ stunnerd 06:10:38.895277 server.go:161: stunner INFO: listener stunner/tcp-gateway/tcp-listener: TURN server running                                                                               │
│ stunnerd 06:10:38.895339 reconcile.go:177: stunner INFO: reconciliation ready: new objects: 4, changed objects: 2, deleted objects: 0, started objects: 2, restarted objects: 0                    │
│ stunnerd 06:10:38.895541 reconcile.go:181: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: plaintext, listeners: stunner/tcp-gateway/tcp-listener: [tcp://10.244.4.74:3478<32 │
│

This warning looks interesting stunner WARNING: running with no clusters: all traffic will be dropped

rg0now commented 1 year ago

Thanks. The warning is fine: stunnerd starts with an empty config and so there are no listeners and clusters at startup. Once the operator renders a valid config (after the line stunnerd 06:10:38.893431 reconcile.go:113: stunner INFO: setting loglevel to "all:INFO") the warning goes away. The problem seems to be that no packet ever reaches STUNner, so this must be a DO/Kubernetes LoadBalancer issue.

Can you please re-check your LoadBalancer status? STUNner should have created an LB Service called udp-listener and another one called tcp-listener for the two gateways, what status is reported for these LBs by the DO dashboard? My guess is that the LB for the TCP loadbalancer should work fine, can you please test over TCP first (see the second part of this doc)? The UDP LB is usually problematic: DO requires a working TCP health-check for UDP LBs. Can it be the case that when you installed the simple-tunnel example you accidentally removed the health-check annotations from the UDP Gateway? Can you please re-add them? We need to make sure all the LB statuses are reported green by DO: from that point, things should work fine.

rg0now commented 9 months ago

Closing this for now. Feel free to reopen if new input becomes available.