Closed RobertFischer closed 8 months ago
Weirdly, the cloud says that I am connected to the [default]
namespace. Or is that the context?
From the looks of it, your k3s cluster is fully exposed to your workstation, i.e. you can already access all services and pods directly. Telepresence therefore tries to create a very small subnet that will be responsible for DNS only, and for some reason it's not able to and gets:
failed to add subnet 192.168.0.1/30: netlink.RouteAdd: invalid argument
which means that the DNS to the cluster won't work as expected. It would be interesting to try and reproduce. Not sure why that is happening. I assume that you're on a Linux box? What type of Linux is it? And how do you install k3s? Any specific configs involved?
Ubuntu Linux. I posted the uname -a in the original post. I did the install through the K3s install script, so nothing fancy there. There's no fancy DNS set-up on the host and I am not sure how the services would already be exposed...we certainly aren't doing that intentionally.
Is there something I can check to help debug this issue?
I posted the uname -a in the original post.
You sure did. Sorry.
I found that the failed to add subnet
message was a red herring, it's due to a bug causing the route to be added twice and it is harmless.
I installed k3s on my workstation (Fedora-38). It turns the workstation itself into a node (a kubectl get node -o wide
reveals that the node has the same IP as the workstation on my intranet). It also installs rules into my iptables (for flannel) that gives me direct access to service IPs. If I create a service in the cluster, I can then curl its IP (from kubectl get svc -o wide
) and get a response, without having telepresence connected. This is why Telepresence considers the cluster to be connected. But as I mentioned, this is not the problem.
I'm able to curl the service using its name through the DNS server that Telepresence installs when it is connected. I would assume that you are too? But I can also intercept and successfully curl it again (this time reaching my intercept handler) which means that this is very likely a problem with the workload that you're intercepting.
Things to try out:
targetPort
in the service instead of using a number. This will remove the need for Telepresence to inject an init-container.traffic-agent
container especially.kubectl get events
list during intercept, do you see anything else besides that the container is terminated (which is normal, since the pod restarts when the agent is injected)?Yes, I can confirm that the workstation is a node.
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev-robert Ready control-plane,master 2d18h v1.26.1+k3s1 10.128.0.30 <none> Ubuntu 22.04.4 LTS 6.5.0-1014-gcp containerd://1.6.15-k3s1
I can also confirm that I can reach the services directly via IP without Telepresence running:
robert@dev-robert:~$ curl -I http://10.43.42.116
HTTP/1.1 403 Forbidden
X-Powered-By: Express
Access-Control-Allow-Origin: https://dev-robert.randomclient.com
Access-Control-Allow-Methods: GET,PUT,POST,DELETE,OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization, Content-Length, X-Requested-With
Access-Control-Allow-Credentials: true
set-cookie: connect.sid=s%3A7h1dKiO3yAZdLKX4eb5YLHB_0iMPbBCy.%2Bwp8kP1ekhUKmWNHoDoaqfSGcViR2DJilGztMlxbxiA; Path=/; HttpOnly
Date: Mon, 26 Feb 2024 15:12:57 GMT
Connection: keep-alive
Keep-Alive: timeout=5
robert@dev-robert:~$ ps aux | grep telepresence
robert 69889 0.0 0.0 7008 2304 pts/0 S+ 15:13 0:00 grep --color=auto telepresence
Although I can't do that through DNS before connecting:
robert@dev-robert:~$ curl -I http://api.dev-robert-explore
curl: (6) Could not resolve host: api.dev-robert-explore
robert@dev-robert:~$ curl -I http://api.dev-robert-explore.cluster.local
curl: (6) Could not resolve host: api.dev-robert-explore.cluster.local
Once I do a telepresence connect
, then I can reach the services:
robert@dev-robert:~$ curl -I http://api.dev-robert-explore
HTTP/1.1 403 Forbidden
X-Powered-By: Express
Access-Control-Allow-Origin: https://dev-robert.randomclient.com
Access-Control-Allow-Methods: GET,PUT,POST,DELETE,OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization, Content-Length, X-Requested-With
Access-Control-Allow-Credentials: true
set-cookie: connect.sid=s%3AfjaHzC9XiZ1tGeZgOjqez8iYvr-s77Ur.M5TB0HgKTHnTZYOmDuA6NEk555j%2FF793SDHcyUl0GqY; Path=/; HttpOnly
Date: Mon, 26 Feb 2024 15:15:32 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Interesting thing happening with the events: I'm getting failed health checks.
31s Normal Created pod/api-58dc9559fc-r9wqw Created container api
31s Normal Started pod/api-58dc9559fc-r9wqw Started container api
31s Normal Pulled pod/api-58dc9559fc-r9wqw Container image "docker.io/datawire/ambassador-telepresence-agent:1.14.4" already present
on machine
31s Normal Created pod/api-58dc9559fc-r9wqw Created container traffic-agent
31s Normal Started pod/api-58dc9559fc-r9wqw Started container traffic-agent
26s Warning Unhealthy pod/api-58dc9559fc-r9wqw Startup probe failed: Get "http://10.42.0.113:8080/public/health": read tcp 10.42.0.1:586
28->10.42.0.113:8080: read: connection reset by peer
21s Warning Unhealthy pod/api-58dc9559fc-r9wqw Startup probe failed: Get "http://10.42.0.113:8080/public/health": read tcp 10.42.0.1:403
74->10.42.0.113:8080: read: connection reset by peer
16s Warning Unhealthy pod/api-58dc9559fc-r9wqw Startup probe failed: HTTP probe failed with statuscode: 404
Now, I think that error is coming after the timeout fail hits.
I'm disabling the health check and seeing if that solves the problem.
Disabling the health check didn't do it. I'm now spinning up a new service which does nothing but sleep infinity
on an ubuntu:latest
image and then trying to intercept that.
No dice.
robert@dev-robert:~$ kubectl describe deployment/dev -n dev-robert-explore
Name: dev
Namespace: dev-robert-explore
CreationTimestamp: Mon, 26 Feb 2024 16:43:18 +0000
Labels: app=dev
pienso_version=dev-latest
Annotations: deployment.kubernetes.io/revision: 1
reloader.stakater.com/auto: true
Selector: app=api
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: Recreate
MinReadySeconds: 90
Pod Template:
Labels: app=api
pienso_version=dev-latest
Containers:
dev:
Image: ubuntu:latest
Port: <none>
Host Port: <none>
Command:
sleep
infinity
Environment Variables from:
api-env ConfigMap Optional: false
api-secret-env Secret Optional: false
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: dev-7f86655994 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 7m48s deployment-controller Scaled up replica set dev-7f86655994 to 1
robert@dev-robert:~$ telepresence quit -s
Telepresence Daemons quitting...done
robert@dev-robert:~$ telepresence connect --namespace dev-robert-explore
Launching Telepresence User Daemon
Launching Telepresence Root Daemon
Connected to context default, namespace dev-robert-explore (https://127.0.0.1:6443)
robert@dev-robert:~$ telepresence intercept dev -- env
telepresence intercept: error: request timed out while waiting for agent dev.dev-robert-explore to arrive
Nothing interesting in the events.
42s Normal ScalingReplicaSet deployment/dev Scaled down replica set dev-574f44fb69 to 0 from 1
42s Normal SuccessfulDelete replicaset/dev-574f44fb69 Deleted pod: dev-574f44fb69-zpcd2
42s Normal Killing pod/dev-574f44fb69-zpcd2 Stopping container dev
11s Normal ScalingReplicaSet deployment/dev Scaled up replica set dev-8bc6866bd to 1
11s Normal SuccessfulCreate replicaset/dev-8bc6866bd Created pod: dev-8bc6866bd-4wclz
10s Normal Scheduled pod/dev-8bc6866bd-4wclz Successfully assigned dev-robert-explore/dev-8bc6866bd-4wclz to dev-robert
10s Normal Pulled pod/dev-8bc6866bd-4wclz Container image "ubuntu:latest" already present on machine
10s Normal Created pod/dev-8bc6866bd-4wclz Created container dev
10s Normal Started pod/dev-8bc6866bd-4wclz Started container dev
I tried to capture the container logs from the traffic-agent
, but that's just not working. This isn't giving me anything even as I start and stop it repeatedly during the intercept.
robert@dev-robert:~$ kubectl logs -f --all-containers -n dev-robert-explore deployment/dev
Anything else to try? Is there someplace else that I can ask for help? I'm getting to the point where I'm giving up on this PoC of Telepresence, but I feel like I'm so close and I want it to work so badly...
Here's the logs again from these most recent attempts. telepresence_logs.zip traces.gz
I'm going to try spinning up an entirely new dev environment and see if there's just something borked with my current dev environment.
Yeah, I tried watch 'kubectl logs --all-containers -n dev-robert-explore deployment/dev | tee -a ./intercept.log'
and got nothing.
telepresence_logs.zip traces.gz
Spinning up a brand new dev environment didn't change anything that I can see. I've attached the logs.
robert@dev-robert2:~$ telepresence connect --namespace=dev-robert2-explore
Launching Telepresence User Daemon
Launching Telepresence Root Daemon
Connected to context default, namespace dev-robert2-explore (https://127.0.0.1:6443)
robert@dev-robert2:~$ telepresence intercept dev -- /bin/bash
telepresence intercept: error: request timed out while waiting for agent dev.dev-robert2-explore to arrive
Another weird issue. I tried bumping up the timeout via the config.yml in order to maybe capture more logs or be able to poke around a bit more, but the increased timeout doesn't seem to be taking.
+ exec telepresence intercept dev -- /bin/bash
telepresence intercept: error: request timed out while waiting for agent dev.dev-robert-explore to arrive
real 0m39.974s
user 0m4.259s
sys 0m0.217s
$ cat ~/.config/telepresence/config.yml
---
client:
timeouts:
intercept: 5m
agentInstall: 5m
trafficManagerConnect: 5m
timeouts:
helm: 5m
intercept: 5m
agentInstall: 5m
trafficManagerConnect: 5m
Using a symbolic port name didn't help.
robert@dev-robert$ kubectl describe pod dev -n dev-robert-explore
Name: dev-66cfb695f6-c6w7c
Namespace: dev-robert-explore
Priority: 1000
Priority Class Name: default-priority
Service Account: default
Node: dev-robert/10.128.0.30
Start Time: Mon, 26 Feb 2024 20:11:37 +0000
Labels: app=dev
pienso_version=dev-latest
pod-template-hash=66cfb695f6
Annotations: telepresence.getambassador.io/restartedAt: 2024-02-26T20:11:05Z
Status: Running
IP: 10.42.0.157
IPs:
IP: 10.42.0.157
Controlled By: ReplicaSet/dev-66cfb695f6
Containers:
dev:
Container ID: containerd://675eabd347de5f1883b37e97bcab052776b7806faf29baefcb0fef079fed4eba
Image: ubuntu:latest
Image ID: docker.io/library/ubuntu@sha256:f9d633ff6640178c2d0525017174a688e2c1aef28f0a0130b26bd5554491f0da
Port: 80/TCP
Host Port: 0/TCP
Command:
sleep
infinity
State: Running
Started: Mon, 26 Feb 2024 20:11:38 +0000
Ready: True
Restart Count: 0
Environment Variables from:
api-env ConfigMap Optional: false
api-secret-env Secret Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5bt49 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-5bt49:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m16s default-scheduler Successfully assigned dev-robert-explore/dev-66cfb695f6-c6w7c to dev-robert
Normal Pulled 3m17s kubelet Container image "ubuntu:latest" already present on machine
Normal Created 3m16s kubelet Created container dev
Normal Started 3m16s kubelet Started container dev
robert@dev-robert$ kubectl describe service dev -n dev-robert-explore
Name: dev
Namespace: dev-robert-explore
Labels: <none>
Annotations: <none>
Selector: app=dev
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.54.120
IPs: 10.43.54.120
Port: <unset> 80/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.157:80
Session Affinity: ClientIP
Events: <none>
Same timeout when I try to intercept.
Is there a port that needs to be open on the host for telepresence to work?
You might try this annotation. Essentially this makes one step (where usually Telepresence starts the intercept ) into two steps where instead you run the intercept after. We've found this sometimes helps where the the telepresence intercept causes a "waiting for agent to arrive" type error.
To you question above, I don't think so. On that same page you'll see "Telepresence will automatically find all services and all ports that will connect to a workload and make them available for an intercept, but you can explicitly define that only one service and/or port can be intercepted." If you search on 'port' on that page you'll see some other notes there including numeric vs symbolic port (symbolic port is generally preferred as Thomas noted.
I'm confused: if I use that annotation to prevent injecting the traffic agent, how will I get the traffic? I guess I'm unclear about what you mean by " you run the intercept after". How do I do that?
Didn't seem to work.
robert@dev-robert$ telepresence quit
Disconnected
robert@dev-robert$ telepresence connect -n dev-robert-explore
Connected to context default, namespace dev-robert-explore (https://127.0.0.1:6443)
robert@dev-robert$ telepresence intercept dev -- /bin/bash
telepresence intercept: error: request timed out while waiting for agent dev.dev-robert-explore to arrive
robert@dev-robert$ kubectl describe deployment -n dev-robert-explore dev
Name: dev
Namespace: dev-robert-explore
CreationTimestamp: Mon, 26 Feb 2024 20:08:28 +0000
Labels: app=dev
pienso_version=dev-latest
Annotations: deployment.kubernetes.io/revision: 4
reloader.stakater.com/auto: true
telepresence.getambassador.io/inject-traffic-agent: disabled
Selector: app=dev
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 90
Pod Template:
Labels: app=dev
pienso_version=dev-latest
Annotations: telepresence.getambassador.io/restartedAt: 2024-02-27T14:29:29Z
Containers:
dev:
Image: ubuntu:latest
Port: 80/TCP
Host Port: 0/TCP
Command:
sleep
infinity
Environment Variables from:
api-env ConfigMap Optional: false
api-secret-env Secret Optional: false
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: dev-567dbd4dfc (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m2s deployment-controller Scaled down replica set dev-66cfb695f6 to 0 from 1
Normal ScalingReplicaSet 91s deployment-controller Scaled up replica set dev-846f55797 to 1
Normal ScalingReplicaSet 55s deployment-controller Scaled down replica set dev-846f55797 to 0 from 1
Normal ScalingReplicaSet 24s deployment-controller Scaled up replica set dev-567dbd4dfc to 1
robert@dev-robert$ kubectl describe pod -n dev-robert-explore -l app=dev
Name: dev-567dbd4dfc-j5sqn
Namespace: dev-robert-explore
Priority: 1000
Priority Class Name: default-priority
Service Account: default
Node: dev-robert/10.128.0.30
Start Time: Tue, 27 Feb 2024 14:30:00 +0000
Labels: app=dev
pienso_version=dev-latest
pod-template-hash=567dbd4dfc
Annotations: telepresence.getambassador.io/restartedAt: 2024-02-27T14:29:29Z
Status: Running
IP: 10.42.0.223
IPs:
IP: 10.42.0.223
Controlled By: ReplicaSet/dev-567dbd4dfc
Containers:
dev:
Container ID: containerd://ec51292000a9c7ed14c530dd407f7002ae4a35ca053229b8fb3120560040b4c7
Image: ubuntu:latest
Image ID: docker.io/library/ubuntu@sha256:f9d633ff6640178c2d0525017174a688e2c1aef28f0a0130b26bd5554491f0da
Port: 80/TCP
Host Port: 0/TCP
Command:
sleep
infinity
State: Running
Started: Tue, 27 Feb 2024 14:30:01 +0000
Ready: True
Restart Count: 0
Environment Variables from:
api-env ConfigMap Optional: false
api-secret-env Secret Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n6z7v (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-n6z7v:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46s default-scheduler Successfully assigned dev-robert-explore/dev-567dbd4dfc-j5sqn to dev-robert
Normal Pulled 46s kubelet Container image "ubuntu:latest" already present on machine
Normal Created 46s kubelet Created container dev
Normal Started 46s kubelet Started container dev
Is there a step that I'm missing?
Is there any other information that I can give you to help debug what's going on? Any idea why the timeouts aren't working? (Maybe that has something to do with it...?) https://github.com/telepresenceio/telepresence/issues/3531#issuecomment-1964993327
I'm seeing these errors in the logs. Do they mean anything?
2024-02-26 15:15:25.7892 error failed to review capability: rpc error: code = InvalidArgument desc = capabilities should not be empty
2024-02-26 15:54:19.9945 error Unable to read cloudd-address.json: failed to parse JSON from file /home/robert/.cache/telepresence/cloudd-address.json: json: cannot unmarshal string into Go value of type cloudd.Address
Also, a lot of this stuff....
connector-20240227T141100.log:2024-02-26 15:39:03.1604 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 15:39:03.3861 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 16:51:30.8955 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 16:51:32.1114 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 16:57:52.1297 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 16:57:53.4980 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 17:06:54.5015 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 17:06:56.3886 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 17:44:26.4920 error connector/session/cloud-daemon-watcher/client : goroutine "/connector/session/cloud-daemon-watcher/client" exited with error: rpc error: code = Unavailable desc = error reading from server: EOF
connector-20240227T141100.log:2024-02-26 17:44:26.4922 info connector/session/cloud-daemon-watcher:shutdown_status : /connector/session/cloud-daemon-watcher/client: exited with error
connector-20240227T141100.log:2024-02-26 18:51:36.5127 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 18:51:38.1858 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 18:51:46.6389 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 18:51:47.9200 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 18:55:23.1854 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: unable to get cloud daemon config from traffic-manager: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 18:55:24.3231 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 19:00:58.1099 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 19:00:59.3149 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 19:02:28.8410 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 19:02:30.5878 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 19:03:58.6638 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 19:04:01.0881 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 19:06:40.6446 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 19:06:41.8224 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 19:19:36.3613 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 19:19:37.5870 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:11:00.4146 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:11:01.5802 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:24:19.9414 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:24:21.1844 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:24:26.1306 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:24:27.2314 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:25:27.9587 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:25:29.4896 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:27:23.8335 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:27:24.9331 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:28:35.0380 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:28:36.1327 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:30:22.9083 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:30:23.9482 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:46:43.9808 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:46:45.4811 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:47:01.9032 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:47:03.2011 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:47:54.3185 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:47:55.5828 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:48:45.5813 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:48:46.6411 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:49:20.3477 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:49:21.6120 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:52:57.5604 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:52:59.1184 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 20:53:50.6075 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 20:53:51.9987 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector-20240227T141100.log:2024-02-26 21:01:08.9204 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 21:01:08.9204 error connector/session/cloud-daemon-watcher/client : goroutine "/connector/session/cloud-daemon-watcher/client" exited with error: unable to proxy: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 21:01:08.9206 info connector/session/cloud-daemon-watcher:shutdown_status : /connector/session/cloud-daemon-watcher/client: exited with error
connector-20240227T141100.log:2024-02-26 21:01:08.9206 error connector/session/cloud-daemon-watcher : unable to proxy: rpc error: code = Canceled desc = context canceled
connector-20240227T141100.log:2024-02-26 21:01:10.0055 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:16:16.4685 debug connector/session/cloud-daemon-watcher : Recv failed: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:16:16.4686 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:16:17.5464 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:16:32.7749 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:16:34.4843 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:17:49.1179 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:17:50.3135 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:18:01.8491 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:18:03.7291 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:29:09.5893 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:29:11.0015 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
connector.log:2024-02-27 14:38:49.9878 debug connector/session/cloud-daemon-watcher : Recv failed: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:38:49.9878 error connector/session/cloud-daemon-watcher : goroutine "/connector/session/cloud-daemon-watcher" exited with error: rpc error: code = Canceled desc = context canceled
connector.log:2024-02-27 14:38:51.1792 info connector/session:shutdown_status : /connector/session/cloud-daemon-watcher : exited with error
Looking at your logs (from the traffic-manager), I can verify that an attempt to install the traffic-agent happens here:
2024-02-27 14:28:22.0679 debug httpd/conn=127.0.0.1:8081 : PrepareIntercept called : session_id="e4dd080c-a51c-44c0-a848-3d5af5085bdf"
2024-02-27 14:28:22.0848 debug agent-configs : MODIFIED telepresence-agents.dev-robert-explore
2024-02-27 14:28:22.0849 debug agent-configs : add dev.dev-robert-explore
2024-02-27 14:28:22.0938 debug agent-configs : Rollout of dev.dev-robert-explore is necessary. An agent is desired but the pod dev-66cfb695f6-c6w7c doesn't have one
2024-02-27 14:28:22.0996 info agent-configs : Successfully rolled out dev.dev-robert-explore
but from the looks of it, the agent-injector doesn't kick in here. That's very odd. It should definitely not say "skipping" here.
2024-02-27 14:28:53.4644 debug agent-injector : Handling admission request CREATE dev-846f55797-.dev-robert-explore
2024-02-27 14:28:53.4722 debug agent-injector : The dev-846f55797-.dev-robert-explore pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping
2024-02-27 14:28:53.4724 debug agent-injector : Webhook request handled successfully
What does the telepresence-agents
configmap contain at this point?
Was this log printed at a time when you had a telepresence.getambassador.io/inject-traffic-agent: disabled
annotation in the deployment?
The errors regarding capability and cloudd-address.json aren't relevant to this problem.
Oh, wait. There's a timeout here. The agent injector CREATE happens after an other modify after a 30 second timeout. That'd explain it:
2024-02-27 14:28:22.1200 debug agent-injector : Handling admission request DELETE dev-66cfb695f6-c6w7c.dev-robert-explore
2024-02-27 14:28:22.1256 debug agent-injector : Webhook request handled successfully
2024-02-27 14:28:52.0763 debug agent-configs : MODIFIED telepresence-agents.dev-robert-explore
2024-02-27 14:28:52.0764 debug agent-configs : del dev.dev-robert-explore
2024-02-27 14:28:52.0859 debug agent-configs : Rollout of dev.dev-robert-explore is not necessary. All pods have the desired agent state
So this is probably caused by the fact that your agentInstall
timeout doesn't take.
@thallgren -- Oh, that sounds like great news. Any idea why the timeout doesn't take? Can you give me a sample config.yaml
which should set it?
The reason that timeout doesn't take, is that it's a client-only setting. This is a traffic-manager timeout. Try this:
$ telepresence helm upgrade --set timeouts.agentArrival=120s
We should really remove that timeout from the config.yml. It was added in a very early version of Telepresence where the client did the agent install. It's meaningless now.
I'm leaving the issue open b/c I don't know the protocol for closing it properly, but this is resolved as far as I'm concerned. Thank you very much for all your help.
Thanks for confirming that it worked.
I don't think we have a protocol for closing tickets, but if the problem is solved, then the ticket should be closed.
Describe the bug
I can connect but can't intercept anything and always get
requested timed out while waiting for agent ... to arrive
errors.I can see in the
kubectl get events
that theapi
service that I am attempting to intercept is having a container terminated after running thetelepresence intercept
command and it's spun back up after the intercept times out.telepresence_logs.zip traces.gz
Repeatedly running
telepresence helm uninstall
andtelepresence helm install
does not help. Rebooting does not help.I installed
sshfs
after seeing it mentioned in some of the error messages. That hasn't helped.I tried using
--replace
and specifying--namespace
. Neither of those helped.To Reproduce
Expected behavior
Intercepting should work.
Versions (please complete the following information):
Output of
telepresence version
Operating system of workstation running
telepresence
commandsKubernetes environment and Version [e.g. Minikube, bare metal, Google Kubernetes Engine]