Closed drawnwren closed 5 months ago
Just when you think you've seen all STUNner weirdness, you encounter this error:
14:24:09.140609 turncat.go:415: turncat DEBUG: cannot write to TURN relay connection for client /dev/stdin (likely harmless): invalid length of IP value
Anyway, there seem to be two semi-related issues here.
First, the peer address ($PEER_IP
) does not seem to be defined in the invocation of turncat
:
turncat - k8s://xoul/udp-gateway:udp-listener udp://${PEER_IP}:9001 -v
...
14:24:05.951221 turncat.go:186: turncat INFO: Turncat client listening on file://stdin, TURN server: turn://k8s-xoul-udpgatew-77492a577e-37ff1616c9d9d844.elb.us-east-2.amazonaws.com:3478?transport=udp, peer: udp::9001
...
Notice that the IP address is empty in the peer URI udp::9001
, which causes the above weird error. What I don't quite understand here is that turncat
tries to resolve the peer address before spawning the TURN client, but this somehow fails to catch the "empty IP address" case. I could reproduce this on my setup, so this is clearly a turncat
bug.
The other issue is that for the subsequent invocations turncat
cannot even create a TURN allocation:
14:24:38.834366 turncat.go:570: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for Yy8GqOLQqJk6g0Nb
Note that the PEER_IP
is still empty, but it is beyond me why we get a working allocation for the first time and why we fail the second time. This will need further investigation. Can you please specify a valid peer IP and retry? How often you get a TURN allocation? Successful allocations are indicated by the below turncat
log line:
14:24:06.027372 turncat.go:322: turncat INFO: new connection: client-address=/dev/stdin, relayed-address=10.0.139.119:46719
With the latest master 512ce555 we at least warn if the peer IP is empty:
bin/turncat -l all:TRACE - k8s://stunner/udp-gateway:udp-listener udp://:5001
...
14:30:13.879997 turncat.go:100: turncat TRACE: Resolving peer address: udp://:5001
14:30:13.880026 main.go:122: turncat-cli ERROR: Could not init turncat: empty IP address in peer URL "udp://:5001"
Strangely, setting PEER_IP
doesn't seem to do anything different.
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend » export PEER_IP=$(kubectl get svc media-plane -o jsonpath='{.spec.clusterIP}')
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend » turncat - k8s://xoul/udp-gateway:udp-listener udp://${PEER_IP}:9001 -v
11:13:46.490794 main.go:81: turncat-cli DEBUG: Reading STUNner config from URI "k8s://xoul/udp-gateway:udp-listener"
11:13:46.490861 main.go:163: turncat-cli DEBUG: Searching for CDS server
11:13:46.490868 k8s_client.go:154: cds-fwd DEBUG: Obtaining kubeconfig
11:13:46.494105 k8s_client.go:161: cds-fwd DEBUG: Creating a Kubernetes client
11:13:46.499706 k8s_client.go:196: cds-fwd DEBUG: Querying CDS server pods in namespace "<all>" using label-selector "stunner.l7mp.io/config-discovery-service=enabled"
11:13:47.257476 k8s_client.go:367: cds-fwd DEBUG: Found pod: xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr
11:13:47.257522 k8s_client.go:376: cds-fwd DEBUG: Creating a SPDY stream to API server using URL "https://CF330C06A766A5450BA821D9355D14DF.sk1.us-east-2.eks.amazonaws.com/api/v1/namespaces/xoul/pods/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr/portforward"
11:13:47.257589 k8s_client.go:384: cds-fwd DEBUG: Creating a port-forwarder to pod
11:13:47.257602 k8s_client.go:400: cds-fwd DEBUG: Waiting for port-forwarder...
11:13:47.475529 k8s_client.go:419: cds-fwd DEBUG: Port-forwarder connected to pod xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr at 127.0.0.1:38535
11:13:47.475607 cds_api.go:215: cds-client DEBUG: GET: loading config for gateway xoul/udp-gateway from CDS server 127.0.0.1:38535
11:13:47.612799 main.go:88: turncat-cli DEBUG: Generating STUNner authentication client
11:13:47.612842 main.go:95: turncat-cli DEBUG: Generating STUNner URI
11:13:47.612867 main.go:102: turncat-cli DEBUG: Starting turncat with STUNner URI: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp
11:13:47.640750 turncat.go:186: turncat INFO: Turncat client listening on file://stdin, TURN server: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp, peer: udp:172.20.55.170:9001
11:13:47.640793 main.go:118: turncat-cli DEBUG: Entering main loop
11:13:47.640872 turncat.go:227: turncat DEBUG: new connection from client /dev/stdin
11:13:47.641525 client.go:110: turnc DEBUG: Resolved STUN server 10.0.155.225:3478 to 10.0.155.225:3478
11:13:47.641922 client.go:119: turnc DEBUG: Resolved TURN server 10.0.155.225:3478 to 10.0.155.225:3478
hello
11:13:55.449634 turncat.go:570: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for xm/656rhP17CPf/d
oh no
rip
dead
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend » turncat - k8s://xoul/udp-gateway:udp-listener udp://${PEER_IP}:9001 -v
11:14:41.997448 main.go:81: turncat-cli DEBUG: Reading STUNner config from URI "k8s://xoul/udp-gateway:udp-listener"
11:14:41.997523 main.go:163: turncat-cli DEBUG: Searching for CDS server
11:14:41.997533 k8s_client.go:154: cds-fwd DEBUG: Obtaining kubeconfig
11:14:41.998315 k8s_client.go:161: cds-fwd DEBUG: Creating a Kubernetes client
11:14:41.998546 k8s_client.go:196: cds-fwd DEBUG: Querying CDS server pods in namespace "<all>" using label-selector "stunner.l7mp.io/config-discovery-service=enabled"
11:14:42.641221 k8s_client.go:367: cds-fwd DEBUG: Found pod: xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr
11:14:42.641253 k8s_client.go:376: cds-fwd DEBUG: Creating a SPDY stream to API server using URL "https://CF330C06A766A5450BA821D9355D14DF.sk1.us-east-2.eks.amazonaws.com/api/v1/namespaces/xoul/pods/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr/portforward"
11:14:42.641335 k8s_client.go:384: cds-fwd DEBUG: Creating a port-forwarder to pod
11:14:42.641349 k8s_client.go:400: cds-fwd DEBUG: Waiting for port-forwarder...
11:14:42.828647 k8s_client.go:419: cds-fwd DEBUG: Port-forwarder connected to pod xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-c5vgr at 127.0.0.1:33303
11:14:42.828685 cds_api.go:215: cds-client DEBUG: GET: loading config for gateway xoul/udp-gateway from CDS server 127.0.0.1:33303
11:14:43.328980 main.go:88: turncat-cli DEBUG: Generating STUNner authentication client
11:14:43.328999 main.go:95: turncat-cli DEBUG: Generating STUNner URI
11:14:43.329007 main.go:102: turncat-cli DEBUG: Starting turncat with STUNner URI: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp
11:14:43.338316 turncat.go:186: turncat INFO: Turncat client listening on file://stdin, TURN server: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp, peer: udp:172.20.55.170:9001
11:14:43.338337 main.go:118: turncat-cli DEBUG: Entering main loop
11:14:43.338392 turncat.go:227: turncat DEBUG: new connection from client /dev/stdin
11:14:43.339075 client.go:110: turnc DEBUG: Resolved STUN server 10.0.25.18:3478 to 10.0.25.18:3478
11:14:43.339096 client.go:119: turnc DEBUG: Resolved TURN server 10.0.25.18:3478 to 10.0.25.18:3478
11:14:51.145965 turncat.go:570: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for ulxp0chTi0WIhmUh
I get the turncat listening log (main.go:186), but I don't think I've seen 322 yet.
Actually one thing definitely changed: we no longer see the weird invalid length
errors.
Anyway, the all retransmissions failed
error is TURN's way of reporting that turncat
cannot reach the TURN server (stunnerd
) via the UDP load balancer. This often happens with EKS and it is usually completely independent from STUNner.
To verify this, can you please rerun the experiment, but this time with a higher stunnerd
loglevel so that we see whether the TURN request hits pod?
TRACE
:
kubectl -n xoul patch gatewayconfigs.stunner.l7mp.io stunner-gatewayconfig --type='merge' -p '{"spec":{"logLevel":"all:TRACE"}}'
If you the log does not report any TURN requests then it is (again) an EKS issue. If this is the case then most probably you're missing a crucial annotation from the Gateway. Unfortunately, we cannot check this: we try to stay away from EKS these days.
So, that's something odd about this. This is with log level set to TRACE
.
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend » kubectl -n xoul patch gatewayconfigs.stunner.l7mp.io stunner-gatewayconfig --type='merge' -p '{"spec":{"logLevel":"all:TRACE"}}'
gatewayconfig.stunner.l7mp.io/stunner-gatewayconfig patched (no change)
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend 130 » export PEER_IP=$(kubectl get svc media-plane -o jsonpath='{.spec.clusterIP}')
(xoul-backend-py3.11) anubis :: Mono/services/xoul_backend » turncat - k8s://xoul/udp-gateway:udp-listener udp://${PEER_IP}:9001 -v
12:24:53.525998 main.go:81: turncat-cli DEBUG: Reading STUNner config from URI "k8s://xoul/udp-gateway:udp-listener"
12:24:53.526127 main.go:163: turncat-cli DEBUG: Searching for CDS server
12:24:53.526138 k8s_client.go:154: cds-fwd DEBUG: Obtaining kubeconfig
12:24:53.527331 k8s_client.go:161: cds-fwd DEBUG: Creating a Kubernetes client
12:24:53.527583 k8s_client.go:196: cds-fwd DEBUG: Querying CDS server pods in namespace "<all>" using label-selector "stunner.l7mp.io/config-discovery-service=enabled"
12:24:54.204296 k8s_client.go:367: cds-fwd DEBUG: Found pod: xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-7666h
12:24:54.204320 k8s_client.go:376: cds-fwd DEBUG: Creating a SPDY stream to API server using URL "https://CF330C06A766A5450BA821D9355D14DF.sk1.us-east-2.eks.amazonaws.com/api/v1/namespaces/xoul/pods/stunner-gateway-operator-controller-manager-5d6c5cd8f8-7666h/portforward"
12:24:54.204377 k8s_client.go:384: cds-fwd DEBUG: Creating a port-forwarder to pod
12:24:54.204392 k8s_client.go:400: cds-fwd DEBUG: Waiting for port-forwarder...
12:24:54.366990 k8s_client.go:419: cds-fwd DEBUG: Port-forwarder connected to pod xoul/stunner-gateway-operator-controller-manager-5d6c5cd8f8-7666h at 127.0.0.1:46113
12:24:54.367040 cds_api.go:215: cds-client DEBUG: GET: loading config for gateway xoul/udp-gateway from CDS server 127.0.0.1:46113
12:24:54.476585 main.go:88: turncat-cli DEBUG: Generating STUNner authentication client
12:24:54.476607 main.go:95: turncat-cli DEBUG: Generating STUNner URI
12:24:54.476614 main.go:102: turncat-cli DEBUG: Starting turncat with STUNner URI: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp
12:24:54.507352 turncat.go:227: turncat DEBUG: new connection from client /dev/stdin
12:24:54.507366 turncat.go:186: turncat INFO: Turncat client listening on file://stdin, TURN server: turn://k8s-xoul-udpgatew-c8acc27fea-79a15cd11c4064f3.elb.us-east-2.amazonaws.com:3478?transport=udp, peer: udp:172.20.55.170:9001
12:24:54.507425 main.go:118: turncat-cli DEBUG: Entering main loop
12:24:54.508126 client.go:110: turnc DEBUG: Resolved STUN server 10.0.155.225:3478 to 10.0.155.225:3478
12:24:54.508138 client.go:119: turnc DEBUG: Resolved TURN server 10.0.155.225:3478 to 10.0.155.225:3478
12:25:02.314866 turncat.go:570: turncat WARNING: relay setup failed for client /dev/stdin: could not allocate new TURN relay transport for client file:/dev/stdin: all retransmissions failed for 38jLwNy6T8qDPXOi
No, this is the output of turncat
(playing the role of a TURN client here), I'm curious of the logs from the other side, stunnerd
(playing the role of the TURN server). The kubectl patch
will update the log level on the server side, and then you can use kubectl -n xoul logs udp-gateway-XXXXX
to see the logs from the stunnerd
pod deployed by the operator automatically for your Gateway.
anubis :: basis/flux-charts/monitoring » k logs pod/udp-gateway-d6fd84b9-9b4q4 -n xoul
19:34:34.131328 main.go:83: stunnerd INFO: starting stunnerd id "xoul/udp-gateway", STUNner version v0.19.0-0-ge4fc45b0 (e4fc45b) built on 2024-05-28T17:19:01+0000
19:34:34.131392 main.go:126: stunnerd INFO: bootstrapping with minimal config
19:34:34.131399 main.go:145: stunnerd INFO: watching configuration at origin "http://10.0.133.122:13478"
19:34:34.131819 main.go:191: stunnerd INFO: new configuration available: "{version=\"v1\",admin:{name=\"xoul/udp-gateway\"},static-auth:{realm=\"stunner.l7mp.io\",username=\"<SECRET>\",password=\"<SECRET>\"},listeners=[],clusters=[]}"
19:34:34.132169 reconcile.go:112: stunner INFO: setting loglevel to "all:INFO"
19:34:34.132204 reconcile.go:140: stunner WARNING: running with no listeners
19:34:34.132212 reconcile.go:156: stunner WARNING: running with no clusters: all traffic will be dropped
19:34:34.132223 reconcile.go:176: stunner INFO: reconciliation ready: new objects: 2, changed objects: 0, deleted objects: 0, started objects: 0, restarted objects: 0
19:34:34.132238 reconcile.go:180: stunner INFO: new dataplane status: xoul/udp-gateway:{logLevel="all:INFO",health-check="http://:8086"}/static-auth:{realm="stunner.l7mp.io",username="<SECRET>",password="<SECRET>"}/[]/[]/allocs:0/status=READY
19:34:34.134169 cds_api.go:277: cds-client INFO: connection successfully opened to config discovery server at ws://10.0.133.122:13478/api/v1/configs/xoul/udp-gateway?watch=true
19:34:34.134863 main.go:191: stunnerd INFO: new configuration available: "{version=\"v1\",admin:{name=\"xoul/udp-gateway\",logLevel=\"all:TRACE\",health-check=\"http://:8086\"},static-auth:{realm=\"stunner.l7mp.io\",username=\"<SECRET>\",password=\"<SECRET>\"},listeners=[\"xoul/udp-gateway/udp-listener\":{turn://10.0.140.212:3478,public=k8s-xoul-udpgatew-f063ed2bd7-4733c622aa9a34b5.elb.us-east-2.amazonaws.com:3478,cert/key=-/-,routes=[xoul/media-plane]}],clusters=[\"xoul/media-plane\":{type=\"STATIC\",protocol=\"UDP\",endpoints=[10.0.137.251,172.20.55.170]}]}"
19:34:34.134934 reconcile.go:112: stunner INFO: setting loglevel to "all:TRACE"
19:34:34.134948 reconcile.go:93: auth-manager TRACE: finishing reconciliation
19:34:34.134953 reconcile.go:95: auth-manager TRACE: running the new-object job queue
19:34:34.134957 reconcile.go:109: auth-manager TRACE: running the deletion job queue
19:34:34.134960 reconcile.go:118: auth-manager TRACE: running the reconciliation job queue
19:34:34.134968 reconcile.go:121: auth-manager TRACE: reconciling object "default-auth-config": static-auth:{realm="stunner.l7mp.io",username="<SECRET>",password="<SECRET>"} -> static-auth:{realm="stunner.l7mp.io",username="<SECRET>",password="<SECRET>"}
19:34:34.134977 auth.go:56: stunner-auth DEBUG: using authentication: static
19:34:34.134983 reconcile.go:131: auth-manager DEBUG: reconciliation ready: to-be-created: 0, changed: 1, deleted: 0
19:34:34.134987 reconcile.go:93: listener-manager TRACE: finishing reconciliation
19:34:34.135013 reconcile.go:95: listener-manager TRACE: running the new-object job queue
19:34:34.135026 listener.go:61: stunner-listener-xoul/udp-gateway/udp-listener TRACE: NewListener: "xoul/udp-gateway/udp-listener":{turn://10.0.140.212:3478,public=k8s-xoul-udpgatew-f063ed2bd7-4733c622aa9a34b5.elb.us-east-2.amazonaws.com:3478,cert/key=-/-,routes=[xoul/media-plane]}
19:34:34.135033 listener.go:123: stunner-listener-xoul/udp-gateway/udp-listener TRACE: Reconcile: "xoul/udp-gateway/udp-listener":{turn://10.0.140.212:3478,public=k8s-xoul-udpgatew-f063ed2bd7-4733c622aa9a34b5.elb.us-east-2.amazonaws.com:3478,cert/key=-/-,routes=[xoul/media-plane]}
19:34:34.135039 manager.go:51: listener-manager TRACE: upsert object "xoul/udp-gateway/udp-listener"
19:34:34.135053 reconcile.go:109: listener-manager TRACE: running the deletion job queue
19:34:34.135056 reconcile.go:118: listener-manager TRACE: running the reconciliation job queue
19:34:34.135067 reconcile.go:131: listener-manager DEBUG: reconciliation ready: to-be-created: 1, changed: 0, deleted: 0
19:34:34.135071 reconcile.go:93: cluster-manager TRACE: finishing reconciliation
19:34:34.135074 reconcile.go:95: cluster-manager TRACE: running the new-object job queue
19:34:34.135089 cluster.go:50: stunner-cluster-xoul/media-plane TRACE: NewCluster: "xoul/media-plane":{type="STATIC",protocol="UDP",endpoints=[10.0.137.251,172.20.55.170]}
19:34:34.135103 cluster.go:76: stunner-cluster-xoul/media-plane TRACE: Reconcile: "xoul/media-plane":{type="STATIC",protocol="UDP",endpoints=[10.0.137.251,172.20.55.170]}
19:34:34.135117 manager.go:51: cluster-manager TRACE: upsert object "xoul/media-plane"
19:34:34.135123 reconcile.go:109: cluster-manager TRACE: running the deletion job queue
19:34:34.135130 reconcile.go:118: cluster-manager TRACE: running the reconciliation job queue
19:34:34.135136 reconcile.go:131: cluster-manager DEBUG: reconciliation ready: to-be-created: 1, changed: 0, deleted: 0
19:34:34.135154 server.go:28: stunner INFO: listener xoul/udp-gateway/udp-listener: [turn-udp://10.0.140.212:3478<0:0>] (re)starting
19:34:34.135164 handlers.go:66: stunner TRACE: NewPermissionHandler
19:34:34.135170 server.go:45: stunner INFO: setting up UDP listener socket pool at 0.0.0.0:3478 with 16 readloop threads
19:34:34.135291 handlers.go:17: stunner TRACE: NewAuthHandler
19:34:34.135365 server.go:166: stunner INFO: listener xoul/udp-gateway/udp-listener: TURN server running
19:34:34.135374 reconcile.go:176: stunner INFO: reconciliation ready: new objects: 2, changed objects: 2, deleted objects: 0, started objects: 1, restarted objects: 0
19:34:34.135378 auth.go:85: stunner-auth TRACE: GetConfig
19:34:34.135409 reconcile.go:180: stunner INFO: new dataplane status: xoul/udp-gateway:{logLevel="all:TRACE",health-check="http://:8086"}/static-auth:{realm="stunner.l7mp.io",username="<SECRET>",password="<SECRET>"}/["xoul/udp-gateway/udp-listener":{turn://10.0.140.212:3478,public=k8s-xoul-udpgatew-f063ed2bd7-4733c622aa9a34b5.elb.us-east-2.amazonaws.com:3478,cert/key=-/-,routes=[xoul/media-plane]}]/["xoul/media-plane":{type="STATIC",protocol="UDP",endpoints=[10.0.137.251,172.20.55.170]}]/allocs:0/status=READY
19:34:34.135416 main.go:201: stunnerd TRACE: reconciliation ready
This settles it: the EKS LoadBalancer fails to forward turncat
traffic to STUNner (otherwise, the logs would show some actual TURN activity). Please double-check that all the necessary EKS components are running, see all the necessary background here. Make sure all the necessary annotations are added to the Gateway, and all the necessary EKS components are running. Once you get a sample UDP application working, STUNner will also start to work.
Sorry, I accidentally closed this. Reopening now.
Closing this for good on preparing for v1, feel free to reopen if any new info comes up.
Description
Possibly a clone of #133, I'm not sure.
Steps to Reproduce
Follow the turncat example with an eks cluster. Expected behavior: Turncat works as expected.
Actual behavior: First turncat invocation:
And subsequent turncat invocations:
Gateway API status
Operator logs
operator.log