Closed iyezhou closed 9 months ago
can you run telepresence gather-logs
and post the zip folder here?
v2.6.8 some problem
# telepresence status
Root Daemon: Running
Version : v2.6.8 (api 3)
DNS :
Remote IP : 127.0.0.1
Exclude suffixes: [.arpa .com .io .net .org .ru]
Include suffixes: []
Timeout : 8s
Also Proxy : (0 subnets)
Never Proxy: (1 subnets)
- 127.0.0.1/32
User Daemon: Running
Version : v2.6.8 (api 3)
Executable : /Users/macbook/Library/Application Support/telepresence/telepresence-pro
Install ID : bd001aca-95b9-4661-8b2d-cc8a85c2f529
Ambassador Cloud:
Status : Logged in
User ID :
Account ID:
Email :
Status : Connected
Kubernetes server : https://kubernetes.docker.internal:6443
Kubernetes context: docker-desktop
Intercepts : 0 total
# telepresence list
config-watcher: ready to intercept (traffic-agent not yet installed)
# telepresence intercept config-watcher --port 8080
telepresence: error: rpc error: code = DeadlineExceeded desc = request timed out while waiting for agent config-watcher.default to arrive
See logs for details (1 error found): "/Users/macbook/Library/Logs/telepresence/connector.log"
If you think you have encountered a bug, please run `telepresence gather-logs` and attach the telepresence_logs.zip to your github issue or create a new one: https://github.com/telepresenceio/telepresence/issues/new?template=Bug_report.md .
I'm having the exact same problem.
I am assessing using Telepresence for our team, and am also running into the same issue on both a local cluster (Docker for Desktop on MacOS) and an AWS EKS cluster. To further investigate on the local cluster, I tried downgrading to version 2.5.8 (both client and TM), and with that I WAS able to start an intercept successfully. I also tried several older 2.6.* versions though, and all of them failed with the timeout error. (Unfortunately, I don't have the permissions to downgrade the TM on our EKS cluster myself, so I won't be able to try that right away; trying to connect to the 2.6.8 TM with the 2.5.8 client resulted in an expected error). Hope this helps.
I can confirm I also came across this issue; upgrading to 2.6.8 fixed it for me.
FWIW I'm seeing the same issue and downgraded to 2.5.8 from 2.6.8.
With 2.6.8 - two logs that may be pertinent:
Doing a describe intercepted pod I see the traffic-agent sidecar fails to start due to this issue:
Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
It was hard to diagnose if this was an issue with the upstream docker image or not.
Secondly the traffic-manager pod is spamming logs with that seems to be also called out in this issue (https://github.com/telepresenceio/telepresence/issues/2669). This doesn't seem to be a blocker though.
2022-06-28 08:19:12.4948 warning agent-configs : Issuing a systema request without ApiKey or InstallID may result in an error
I'm still seeing this error with version 2.7.3. Has anyone found a solution?
I'm still seeing this error with version 2.7.3. Has anyone found a solution?
me 2, how to resolve the problem?
Still same problem here on windows machine with v2.8.5 and connected to k8s cluster sucessfully
Still same problem here on mac m1 with v2.9.2 and connected to k8s cluster sucessfully, and the client and TM is the same version.
Same problem as @zbdba with telepresence 2.9.5
EKS v1.21
telepresence version ✔ 22:00:06
Enhanced Client: v2.9.5
Root Daemon : v2.9.5
User Daemon : v2.9.5
Traffic Manager: v2.9.5
telepresence intercept echo-server --port 8000:80 --namespace dev
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent echo-server.dev to arrive
when i describe the intercepted pod, i can see the traffic-agent
container is failing its healthcheck.
Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
That failure is normal and usually fixes itself. What do you get from kubectl get events
when this happens?
@thallgren
kubectl describe pod echo-server-1 -ndev | grep Events -A10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 46m (x193 over 3h) kubelet Readiness probe failed: Get "http://100.69.10.43:8000/health": dial tcp 100.69.10.43:8000: connect: connection refused
Warning BackOff 11m (x501 over 170m) kubelet Back-off restarting failed container
Warning Unhealthy 6m7s (x96 over 3h) kubelet Liveness probe failed: Get "http://100.69.10.43:8000/health": dial tcp 100.69.10.43:8000: connect: connection refused
Warning Unhealthy 67s (x1081 over 3h1m) kubelet Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
What is this?
Warning Unhealthy 46m (x193 over 3h) kubelet Readiness probe failed: Get "http://100.69.10.43:8000/health": dial tcp 100.69.10.43:8000: connect: connection refused
Warning BackOff 11m (x501 over 170m) kubelet Back-off restarting failed container
Warning Unhealthy 6m7s (x96 over 3h) kubelet Liveness probe failed: Get "http://100.69.10.43:8000/health": dial tcp 100.69.10.43:8000: connect: connection refused
@thallgren before i start an intercept, my pod with 1 container, passes healthchecks, and is running. see the below manual test to confirm from another pod in the cluster trying to hit the pods service endpoints.
root@testpod:/# curl echo-server.dev.svc.cluster.local/health
{"status":"ok"}
root@testpod:/# curl echo-server.dev.svc.cluster.local
echo-server is up!
after i run telepresence connect
i can reach the same endpoints from my macbook with no issues.
~ curl echo-server.dev.svc.cluster.local/health
{"status":"ok"}
however, after i run telepresence intercept echo-server --port 8000:80 --namespace dev
, the above healthchecks immediately starts failing.
curl: (7) Failed to connect to echo-server.dev.svc.cluster.local port 80 afte
r 1040 ms: Connection refused
and telepresence intercept fails
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent echo-server.dev to arrive
in light of the above, i create a much simpler pod, without healthchecks in its container, and attempted to intercept its service. this time i get a slightly different error, but essentially the same thing.
telepresence intercept web --namespace dev ✔ 16:06:09
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent web.dev to arrive: Events that may be relevant:
AGE TYPE REASON OBJECT MESSAGE
21s Warning Unhealthy pod/web-78ffbccc6c-t7f8b Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
21s Warning Unhealthy pod/web-78ffbccc6c-t7f8b Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
21s Warning Unhealthy pod/web-78ffbccc6c-t7f8b Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
here are the traffic manager logs.
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3054 debug httpd/conn=127.0.0.1:8081 : PrepareIntercept called : session_id="1eda010a-ad3b-4d61-b577-5c6fc5f3b916" (from cmd/traffic/cmd/manager/service.go:461)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3325 debug agent-configs : MODIFIED telepresence-agents.dev (from cmd/traffic/cmd/manager/internal/mutator/watcher.go:549)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3326 debug agent-configs : add web.dev (from cmd/traffic/cmd/manager/internal/mutator/watcher.go:266)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3411 info agent-configs : Successfully rolled out web.dev (from cmd/traffic/cmd/manager/internal/mutator/watcher.go:93)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3645 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3656 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE web-cb5b749df-.dev (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3844 info agent-injector/conn=100.69.103.43:443 : Injecting traffic-agent into pod web-cb5b749df-.dev (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:173)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3850 info agent-injector/conn=100.69.103.43:443 : Injecting 7 patches into pod web-cb5b749df-.dev (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:190)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:38.3867 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:43.0901 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1eda010a-ad3b-4d61-b577-5c6fc5f3b916" (from cmd/traffic/cmd/manager/internal/state/intercept.go:375)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:48.5142 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1eda010a-ad3b-4d61-b577-5c6fc5f3b916" (from cmd/traffic/cmd/manager/internal/state/intercept.go:375)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:58.0837 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:58.0843 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE cron-argo-security-group-manager-27857910-.tools (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:58.0844 debug agent-injector/conn=100.69.103.43:443 : The cron-argo-security-group-manager-27857910-.tools pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:133)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:58.0845 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:34:58.5375 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1eda010a-ad3b-4d61-b577-5c6fc5f3b916" (from cmd/traffic/cmd/manager/internal/state/intercept.go:375)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:00.1262 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:00.1290 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE cron-argo-security-group-manager-27857915-.tools (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:00.1291 debug agent-injector/conn=100.69.103.43:443 : The cron-argo-security-group-manager-27857915-.tools pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:133)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:00.1292 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:10.8603 debug podWatcher calling updateSubnets with [100.69.0.0/17 172.18.0.0/24] (from cmd/traffic/cmd/manager/internal/cluster/podwatcher.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:14.2924 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:14.2936 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE cron-argo-security-group-manager-27857915-.tools (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:14.2940 debug agent-injector/conn=100.69.103.43:443 : The cron-argo-security-group-manager-27857915-.tools pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:133)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:14.2941 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:23.0515 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:23.0522 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE cron-argo-security-group-manager-27857905-.tools (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:23.0523 debug agent-injector/conn=100.69.103.43:443 : The cron-argo-security-group-manager-27857905-.tools pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:133)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:23.0524 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:38.5020 debug agent-injector/conn=100.69.103.43:443 : Received webhook request... (from cmd/traffic/cmd/manager/internal/mutator/service.go:78)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:38.5032 debug agent-injector/conn=100.69.103.43:443 : Handling admission request CREATE cron-argo-security-group-manager-27857915-.tools (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:92)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:38.5034 debug agent-injector/conn=100.69.103.43:443 : The cron-argo-security-group-manager-27857915-.tools pod has not enabled traffic-agent container injection through "telepresence-agents" configmap or through "telepresence.getambassador.io/inject-traffic-agent" annotation; skipping (from cmd/traffic/cmd/manager/internal/mutator/agent_injector.go:133)
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-19 18:35:38.5035 debug agent-injector/conn=100.69.103.43:443 : Webhook request handled successfully (from cmd/traffic/cmd/manager/internal/mutator/service.go:85)
After this, trying to run telepresence leave web
results in no output and the pod keeps the injected container indefinitely.
here's a the result of curling my pods exposed port(should respond with the pod name) followed by trying the same on the new pod that gets created with the injected container, after i attempted to interecept.
root@test:/# curl 100.69.102.162:8000
web-5fbfc5b9fd-2clgk
root@gurps:/# curl 100.69.90.18:8000
curl: (7) Failed to connect to 100.69.90.18 port 8000 after 0 ms: Connection refused
events from teh new pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 76s default-scheduler Successfully assigned dev/web-cb5b749df-58xqr to ip-172-18-0-152.eu-west-1.compute.internal
Normal Pulled 75s kubelet Container image "docker.io/datawire/ambassador-telepresence-agent:1.13.5" already present on machine
Normal Created 75s kubelet Created container tel-agent-init
Normal Started 75s kubelet Started container tel-agent-init
Normal Pulling 74s kubelet Pulling image "x.dkr.ecr.eu-west-1.amazonaws.com/httpserver:latest"
Normal Pulled 74s kubelet Successfully pulled image "xx.dkr.ecr.eu-west-1.amazonaws.com/httpserver:latest" in 112.250343ms
Normal Created 74s kubelet Created container web-server
Normal Started 73s kubelet Started container web-server
Normal Pulled 73s kubelet Container image "docker.io/datawire/ambassador-telepresence-agent:1.13.5" already present on machine
Normal Created 73s kubelet Created container traffic-agent
Normal Started 73s kubelet Started container traffic-agent
Warning Unhealthy 6s (x8 over 71s) kubelet Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
Does the log from the traffic-agent show anything? E.g. kubectl -n dev logs web-78ffbccc6c-t7f8b traffic-agent
.
I see that a tel-agent-init
container is used. That's often unnecessary and will cause problems in some setups. Try using symbolic targetPorts in you services if possible. You can read more about that here.
@thallgren thanks for your reply.
For further clarity here is my yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
namespace: dev
spec:
replicas: 1
selector:
matchLabels:
server: web-server
template:
metadata:
labels:
server: web-server
spec:
containers:
- name: web-server
image: xxxx
imagePullPolicy: Always
ports:
- name: http
containerPort: 8000
apiVersion: v1
kind: Service
metadata:
name: web
namespace: dev
spec:
selector:
server: web-server
ports:
- port: 8000
targetPort: http
type: ClusterIP
telepresence intercept web --port 8000 --namespace=dev
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent web.dev to arrive: Events that may be relevant:
AGE TYPE REASON OBJECT MESSAGE
28s Warning Unhealthy pod/web-68d75fbcdf-dh8cj Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
28s Warning Unhealthy pod/web-68d75fbcdf-dh8cj Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
28s Warning Unhealthy pod/web-68d75fbcdf-dh8cj Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager logs
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:30.4315 info agent-configs : Successfully rolled out web.dev
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:30.4693 info agent-injector/conn=100.69.103.43:443 : Injecting traffic-agent into pod web-68d75fbcdf-.dev
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:30.4694 info agent-injector/conn=100.69.103.43:443 : Injecting 7 patches into pod web-68d75fbcdf-.dev
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:32.2441 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1xxxxx"
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:40.5740 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1xxxxx"
traffic-manager-579cf968df-mbb9k traffic-manager 2022-12-20 11:09:50.5823 error httpd/conn=127.0.0.1:8081 : Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
traffic-manager-579cf968df-mbb9k traffic-manager : session_id="1xxxxx"
traffic-agent logs
stern web-68d75fbcdf-dh8cj
+ web-68d75fbcdf-dh8cj › web-server
+ web-68d75fbcdf-dh8cj › traffic-agent
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5328 info Enhanced Traffic Agent v1.13.5
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5331 info Adding exported mounts for container web-server
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5332 info new agent secrets mount path: /var/run/secrets/kubernetes.io
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5333 info LoadConfig: processing environment...
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5333 info LoadConfig: processing annotations...
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5333 info LoadConfig: done
web-68d75fbcdf-dh8cj traffic-agent 2022-12-20 11:09:31.5366 info LoadConfig: asking traffic-manager for license information...
@gurpalw here are two things to try:
$ telepresence helm install --upgrade --set agent.logLevel=debug
tel-agent-init
container when you're using a symbolic targetPort
. You should only get that when the service is headless (it isn't) or when the targetPort
is numeric (it isn't). Could it be that you have another service that also maps to this deployment? You can check this by viewing the content of the telepresence-agents
in the same namespace as the deployment:
$ kubectl describe configmap telepresence-agents
@thallgren here are the traffic-agent logs with debug enabled:
telepresence intercept web --port 8000 --namespace=dev
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent web.dev to arrive: Events that may be relevant:
AGE TYPE REASON OBJECT MESSAGE
13s Warning Unhealthy pod/web-65656f565-6qn2f Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
13s Warning Unhealthy pod/web-65656f565-6qn2f Readiness probe failed: stat: can't stat '/tmp/agent/ready': No such file or directory
+ web-65656f565-6qn2f › traffic-agent
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0056 info Enhanced Traffic Agent v1.13.5
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0060 debug Logging at this level "debug"
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0060 info Adding exported mounts for container web-server
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0060 debug checking agent secrets mount path: /var/run/secrets/kubernetes.io
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0061 debug checking appmounts directory: /var/run/secrets/kubernetes.io
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0061 debug create appmounts directory: /tel_app_exports/web-server/var/run/secrets/kubernetes.io
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0061 debug create appmounts symlink: /var/run/secrets/kubernetes.io /tel_app_exports/web-server/var/run/secrets/kubernetes.io
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0061 info new agent secrets mount path: /var/run/secrets/kubernetes.io
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0061 info LoadConfig: processing environment...
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0062 info LoadConfig: processing annotations...
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0062 info LoadConfig: done
web-65656f565-6qn2f traffic-agent 2022-12-20 12:53:26.0096 info LoadConfig: asking traffic-manager for license information...
the logs stop there.
here is the output of the configmap:
kubectl describe configmap telepresence-agents -ndev
Name: telepresence-agents
Namespace: dev
Labels: app.kubernetes.io/created-by=traffic-manager
app.kubernetes.io/name=telepresence-agents
app.kubernetes.io/version=2.9.5
Annotations: <none>
Data
====
web:
----
EnvoyAdminPort: 19000
EnvoyLogLevel: warning
EnvoyServerPort: 18000
agentImage: docker.io/datawire/ambassador-telepresence-agent:1.13.5
agentName: web
containers:
- Mounts: null
envPrefix: A_
intercepts:
- agentPort: 9900
containerPort: 8000
containerPortName: http
protocol: TCP
serviceName: web
servicePort: 8000
serviceUID: 4023d7fc-4dcb-466f-b0bf-e26523ddd964
mountPoint: /tel_app_mounts/web-server
name: web-server
logLevel: debug
managerHost: traffic-manager.ambassador
managerPort: 8081
namespace: dev
tracingPort: 15766
workloadKind: Deployment
workloadName: web
BinaryData
====
Events: <none>
@gurpalw is that the configmap you're using when you get a tel-init-container
injected?
The log proves that the traffic-agent is unable to contact the traffic-manager at traffic-manager.ambassador:8081
. The connection attempt is hanging indefinitely. I know this because any error from that attempt would be logged, and a success would result in the next log message which is "LoadConfig: asking traffic-manager for Ambassador Cloud configuration...". It never shows up.
here is the service yaml.
Name: traffic-manager
Namespace: ambassador
Labels: app=traffic-manager
app.kubernetes.io/created-by=Helm
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/version=2.9.5
helm.sh/chart=telepresence-2.9.5
telepresence=manager
Annotations: meta.helm.sh/release-name: traffic-manager
meta.helm.sh/release-namespace: ambassador
Selector: app=traffic-manager,telepresence=manager
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: None
IPs: None
Port: api 8081/TCP
TargetPort: api/TCP
Endpoints: 100.69.107.175:8081
Port: grpc-trace 15766/TCP
TargetPort: grpc-trace/TCP
Endpoints: 100.69.107.175:15766
Session Affinity: None
Events: <none>
i have launched a test ubuntu pod in the target namespace(dev) and i can reach services via dns in other namespaces:
root@gurps:/# curl loki-distributed-querier.logging.svc.cluster.local:3100
404 page not found
but i cannot reach the traffic manager service when curling the dns.
curl traffic-manager.ambassador.svc.cluster.local:8081
curl: (6) Could not resolve host: traffic-manager.ambassador.svc.cluster.local
however, i can reach the endpoint listed in the service description above when curling the ip directly.
root@gurps:/# curl 100.69.107.175:8081
Hello World from: /
here i dig another service to prove dns is working in the cluster.
root@gurps:/# dig loki-distributed-querier.logging.svc.cluster.local
; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> loki-distributed-querier.logging.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16581
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 20e9c56df32eca8f (echoed)
;; QUESTION SECTION:
;loki-distributed-querier.logging.svc.cluster.local. IN A
;; ANSWER SECTION:
loki-distributed-querier.logging.svc.cluster.local. 5 IN A 10.100.16.31
;; Query time: 0 msec
;; SERVER: 10.100.0.10#53(10.100.0.10) (UDP)
;; WHEN: Tue Dec 20 14:42:42 UTC 2022
;; MSG SIZE rcvd: 157
and here i do the same thing but with the traffic-manager. you can see that it doesn't respond with an answer section
root@gurps:/# dig traffic-manager.ambassador.svc.cluster.local
; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> traffic-manager.ambassador.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8464
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: c85dfbb0a6cb029e (echoed)
;; QUESTION SECTION:
;traffic-manager.ambassador.svc.cluster.local. IN A
;; AUTHORITY SECTION:
cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1671541000 7200 1800 86400 5
;; Query time: 0 msec
;; SERVER: 10.100.0.10#53(10.100.0.10) (UDP)
;; WHEN: Tue Dec 20 14:42:28 UTC 2022
;; MSG SIZE rcvd: 178
@gurpalw the answer section is provided by the DNS service, not the actual service that you dig for. So the question to ask is, why doesn't the cluster's DNS service serve up the traffic-manager's IP? The traffic-manager itself play no part in this.
@thallgren
problem solved, my coredns was broken! fixed. thanks @thallgren and sorry for taking up so much of your time!
Thats a lot of reading but gave me nothing :)
Im trying to setup v2.11.1
with same error, I didn't try older versions.
telepresence intercept dev-device-api --port 8080:8080 --env-file tele.env
results in:
connector.CreateIntercept: request timed out while waiting for agent dev-device-api.dev to arrive
logs with loglevel trace: telepresence_logs.zip
Same here
This is what I see in the traffic-agent
2023-03-10 16:08:13.1345 info LoadConfig: processing environment... │
│ 2023-03-10 16:08:13.1345 info LoadConfig: done │
│ 2023-03-10 16:08:13.1375 info LoadConfig: asking traffic-manager for license information... │
│ 2023-03-10 16:08:15.5402 error error(s) getting license: license not found │
│ 2023-03-10 16:08:15.5402 info LoadConfig: asking traffic-manager for Ambassador Cloud configuration... │
│ 2023-03-10 16:08:15.5480 info trying to connect to ambassador cloud at app.getambassador.io:443 │
│ 2023-03-10 16:08:15.5520 info connected to ambassador cloud at app.getambassador.io:443 │
│ 2023-03-10 16:08:15.5529 info ftp-server : FTP server listening on :40463 │
│ 2023-03-10 16:08:15.5532 info envoy-server : Management server listening on 18000 │
│ │
│ 2023-03-10 16:08:15.5532 info envoy : started command ["envoy" "-l" "warning" "-c" "bootstrap-ads.pb" "--base-id" "1"] : dexec.pid="24" │
│ 2023-03-10 16:08:15.5536 info envoy : : dexec.pid="24" dexec.stream="stdin" dexec.err="EOF" │
│ 2023-03-10 16:08:15.6045 info envoy : : dexec.pid="24" dexec.stream="stdout+stderr" dexec.data="[2023-03-10 16:08:15.604][24][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x7ffe22b37fe8 │
│ 2023-03-10 16:08:15.6045 info envoy : : dexec.pid="24" dexec.stream="stdout+stderr" dexec.data="[2023-03-10 16:08:15.604][24][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):\n" │
│ 2023-03-10 16:08:15.6045 info envoy : : dexec.pid="24" dexec.stream="stdout+stderr" dexec.data="[2023-03-10 16:08:15.604][24][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: c5b7729c6ca304d1d8961790e8379358b6f1e233/1.23.4/Clea │
│ 2023-03-10 16:08:15.6046 info envoy : : dexec.pid="24" dexec.stream="stdout+stderr" dexec.data="[2023-03-10 16:08:15.604][24][critical][backtrace] [./source/server/backtrace.h:98] #0: [0x7f92eafb5acd]\n" │
│ 2023-03-10 16:08:17.4586 info envoy : finished with error: signal: segmentation fault (core dumped) : dexec.pid="24" │
│ 2023-03-10 16:08:17.4586 error envoy : goroutine "/envoy" exited with error: signal: segmentation fault (core dumped)
Hi @mwilniewiec
while we wait for a fix I fixed the issue for now by downgrading the traffic-agent image! To do this, just downgrade the traffic manager with these values
File: traffic-manager.yaml
agent:
image:
name: ambassador-telepresence-agent
tag: 1.13.9
Apply via telepresence helm upgrade -f traffic-manager.yaml
Opened an issue for that: https://github.com/telepresenceio/telepresence/issues/3069
IDK if this will help but it seems upgrading to Telepresence 2.16.1 (Kubernetes-side) helped for me. After upgrade, I ran telepresence quit && telepresence connect
before running my intercept, and it worked fine.
I'm on 2.16.1 and seeing this issue. I created an intercept to a shell (ie: passed -- /bin/bash
) and then rebooted. Now I get this error when attempting to repeat the intercept. Quitting and connecting doesn't change anything. Logs attached.
telepresence_logs.zip
I'm seeing similar issues. Not much information provided as to why it is crashing.
I'm closing this issue because it's all over the place, including a "problem solved, my coredns was broken". @vcellu , if you experience a crash, I suggest you open a new ticket.
Some hints for you before you do that:
telepresence loglevel debug
before you provoke the crash.telepresence gather-logs
will provide.
hi, I follow this page: https://www.getambassador.io/resources/easy-debugging-of-spring-boot-microservices/
but got timed out error