telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.59k stars 520 forks source link

[v2] Telepresence2: Unable to SSH #1680

Closed indrasvat closed 3 years ago

indrasvat commented 3 years ago

Gave telepresence2 a spin. Can't get it to work with my local minikube (with the virtualbox driver).

Steps followed: https://github.com/telepresenceio/telepresence/tree/release/v2#walkthrough

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14.5
BuildVersion:   18F132

$ go version
go version go1.15.2 darwin/amd64

$ telepresence version
Client v2.2.0 (api v3)

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.18", GitCommit:"6f6ce59dc8fefde25a3ba0ef0047f4ec6662ef24", GitTreeState:"clean", BuildDate:"2021-04-15T03:31:30Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

# minikube start --driver=virtualbox --cpus=2 --memory=8192
$ minikube version
minikube version: v1.12.1
commit: 5664228288552de9f3a446ea4f51c6f29bbdd0e0

$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

$ kubectl get po -ndefault
NAME                     READY   STATUS    RESTARTS   AGE
hello-78745876ff-n4t54   1/1     Running   0          81m
$ telepresence connect
Launching Telepresence Daemon v2.2.0 (api v3)
Need root privileges to run "/usr/local/bin/telepresence daemon-foreground ~/Library/Logs/telepresence '' ''"
Password:
Connecting to traffic manager...
Connected to context minikube (https://192.168.99.103:8443)

$ kubectl describe pod -nambassador
Name:         traffic-manager-64858f494f-txhzw
Namespace:    ambassador
Priority:     0
Node:         minikube/192.168.99.103
Start Time:   Tue, 20 Apr 2021 14:53:39 -0700
Labels:       app=traffic-manager
              pod-template-hash=64858f494f
              telepresence=manager
Annotations:  <none>
Status:       Running
IP:           172.17.0.3
IPs:
  IP:           172.17.0.3
Controlled By:  ReplicaSet/traffic-manager-64858f494f
Containers:
  traffic-manager:
    Container ID:   docker://b5aee262e7f1e2c68633f93a78f22a82cfd82c65506ec9b77c8583f3b9bbce0e
    Image:          docker.io/datawire/tel2:2.2.0
    Image ID:       docker-pullable://datawire/tel2@sha256:5a8ff39d298816bfa6242bb0085c1aebbbe70ac1bbc88e89ac7a398b0d7f9d5f
    Ports:          8022/TCP, 8081/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 20 Apr 2021 15:07:45 -0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 20 Apr 2021 14:53:49 -0700
      Finished:     Tue, 20 Apr 2021 15:06:44 -0700
    Ready:          True
    Restart Count:  1
    Environment:
      LOG_LEVEL:     debug
      SYSTEMA_HOST:  app.getambassador.io
      SYSTEMA_PORT:  443
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-c95sv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-c95sv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-c95sv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Telepresence Logs:

connector.log daemon.log

After running telepresence connect, I'm no longer able to kubectl ... to minikube. I have to telepresence quit for things to start working again.

$ kubectl get po
^C

$ telepresence quit
Telepresence Daemon quitting...done

$ kubectl get po
NAME                     READY   STATUS    RESTARTS   AGE
hello-78745876ff-n4t54   1/1     Running   0          75m

I manually ran the actual ssh command (picked from connector.log) with verbose mode, and got this:

$ ssh -vvv -D 127.0.0.1:50866 -F none -C \
  -oConnectTimeout=10 -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null \
  -N -oExitOnForwardFailure=yes -p 50853 telepresence@localhost

OpenSSH_7.9p1, LibreSSL 2.7.3
debug2: resolving "localhost" port 50853
debug2: ssh_connect_direct
debug1: Connecting to localhost [::1] port 50853.
debug2: fd 5 setting O_NONBLOCK
debug1: connect to address ::1 port 50853: Operation timed out
debug1: Connecting to localhost [127.0.0.1] port 50853.
debug1: Connection established.
debug1: identity file ~/.ssh/id_rsa type 0
debug1: identity file ~/.ssh/id_rsa-cert type -1
debug1: identity file ~/.ssh/id_dsa type -1
debug1: identity file ~/.ssh/id_dsa-cert type -1
debug1: identity file ~/.ssh/id_ecdsa type -1
debug1: identity file ~/.ssh/id_ecdsa-cert type -1
debug1: identity file ~/.ssh/id_ed25519 type -1
debug1: identity file ~/.ssh/id_ed25519-cert type -1
debug1: identity file ~/.ssh/id_xmss type -1
debug1: identity file ~/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_7.9
ssh_exchange_identification: read: Connection reset by peer

Not sure how to debug this further.

indrasvat commented 3 years ago

Any insights on this?

indrasvat commented 3 years ago

Per the suggestion from @cindy, I tried the --mapped-namespaces flag. Now, it seems the SSH command works, and I'm able to list pods etc, but I'm still not able to hit the hello service via curl.

The only difference from the tutorial is that the hello service is deployed in the hello namespace, not default.

$ telepresence connect --mapped-namespaces default,hello
Launching Telepresence Daemon v2.2.0 (api v3)
Need root privileges to run "/usr/local/bin/telepresence daemon-foreground ~/Library/Logs/telepresence '' ''"
Password:
Connecting to traffic manager...
Connected to context minikube (https://192.168.99.103:8443)
$ kubectl get ns
NAME              STATUS   AGE
ambassador        Active   14d
argo              Active   3d16h
default           Active   14d
hello             Active   14d
kube-node-lease   Active   14d
kube-public       Active   14d
kube-system       Active   14d

$ kubectl get po -nhello
NAME                     READY   STATUS    RESTARTS   AGE
hello-78745876ff-8xw78   1/1     Running   2          14d

$ kubectl get svc -nhello
NAME    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
hello   ClusterIP   10.97.229.172   <none>        80/TCP    14d
$ curl -v --connect-timeout 10 --max-time 10 'http://hello.hello'
* Rebuilt URL to: http://hello.hello/
*   Trying 10.97.229.172...
* TCP_NODELAY set
* Connected to hello.hello (10.97.229.172) port 80 (#0)
> GET / HTTP/1.1
> Host: hello.hello
> User-Agent: curl/7.54.0
> Accept: */*
> 
* Operation timed out after 10001 milliseconds with 0 bytes received
* stopped the pause stream!
* Closing connection 0
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

connector.log so far:

$ tail -f ~/Library/Logs/telepresence/connector.log 
2021/05/04 16:02:32 Patching synced Namespace f3cf8c46-f6e1-4aaa-9b59-e076c353a7a9
2021/05/04 16:02:32 connector/background-k8swatch/namespaces posting search paths to default hello
2021/05/04 16:02:32 connector/background-manager/intercept-port-forward posting search paths to default hello
2021/05/04 16:02:32 connector/background-k8swatch/namespaces Watching namespace "hello"
2021/05/04 16:02:32 connector/background-k8swatch/namespaces Watching namespace "default"
2021/05/04 16:02:32 Patching add Service 1f23fdd7-0f6e-43f1-8d13-1f3c01c412af
2021/05/04 16:02:32 Patching add Service 1f23fdd7-0f6e-43f1-8d13-1f3c01c412af
2021/05/04 16:02:32 connector/server-socks [pid:69231] started command []string{"ssh", "-D", "127.0.0.1:49975", "-F", "none", "-C", "-oConnectTimeout=10", "-oStrictHostKeyChecking=no", "-oUserKnownHostsFile=/dev/null", "-N", "-oExitOnForwardFailure=yes", "-p", "49972", "telepresence@localhost"}
2021/05/04 16:02:32 connector/server-socks [pid:69231] stdin  < EOF
2021/05/04 16:02:32 connector/server-socks [pid:69231] stdout+stderr > "Warning: Permanently added '[localhost]:49972' (ECDSA) to the list of known host"… (4 runes truncated)

2021/05/04 16:07:32 Patching synced Namespace f3cf8c46-f6e1-4aaa-9b59-e076c353a7a9
2021/05/04 16:07:32 Patching synced Service 1f23fdd7-0f6e-43f1-8d13-1f3c01c412af

Just to confirm, the hello service is running:

$ kubectl port-forward hello-78745876ff-8xw78 8082:8080 -nhello
Forwarding from 127.0.0.1:8082 -> 8080
Forwarding from [::1]:8082 -> 8080
Handling connection for 8082

...

$ curl -v --connect-timeout 10 --max-time 10 'http://localhost:8082'
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8082 (#0)
> GET /hello HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx/1.10.0
< Date: Tue, 04 May 2021 23:35:20 GMT
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Connection: keep-alive
< 
CLIENT VALUES:
client_address=127.0.0.1
command=GET
real path=/hello
query=nil
request_version=1.1
request_uri=http://localhost:8080/hello

SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001

HEADERS RECEIVED:
accept=*/*
host=localhost:8082
user-agent=curl/7.54.0
BODY:
* Connection #0 to host localhost left intact
-no body in request-
thallgren commented 3 years ago

It's really odd that curl reports * Connected to hello.hello (10.97.229.172) port 80 (#0) and then times out. A log from the hello pod's traffic-agent container would be interesting to look at.

indrasvat commented 3 years ago

@thallgren, I only see the echoserver container 🤷🏼‍♂️

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: hello
    pod-template-hash: 78745876ff
  name: hello-78745876ff-8xw78
  namespace: hello
spec:
  containers:
  - image: k8s.gcr.io/echoserver:1.4
    name: echoserver
  priority: 0
  serviceAccountName: default
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

UPDATE:

I'm not at the intercept stage yet. Not able to get past https://github.com/telepresenceio/telepresence/tree/release/v2#establish-a-connection-to--the-cluster-outbound-traffic.

indrasvat commented 3 years ago

@thallgren, updated minikube to latest. Still the same issue.

🚩 minikube version
minikube version: v1.20.0
commit: c61663e942ec43b20e8e70839dcca52e44cd85ae

🚩 k get ns
NAME              STATUS   AGE
ambassador        Active   23d
argo              Active   12d
default           Active   23d
hello             Active   23d
ingress-nginx     Active   14m
kube-node-lease   Active   23d
kube-public       Active   23d
kube-system       Active   23d

🚩 k get po -nhello
NAME                     READY   STATUS    RESTARTS   AGE
hello-78745876ff-8xw78   1/1     Running   5          23d

🚩 telepresence connect --mapped-namespaces=hello

🚩 curl -v 'http://hello.hello'
* Rebuilt URL to: http://hello.hello/
*   Trying 10.97.229.172...
* TCP_NODELAY set
* Connected to hello.hello (10.97.229.172) port 80 (#0)
> GET / HTTP/1.1
> Host: hello.hello
> User-Agent: curl/7.54.0
> Accept: */*
> 
^C
toheart commented 3 years ago

I ran into the same problem. please help!!!!

toheart commented 3 years ago

I use kubectl run -it --rm --restart=Never busybox1 --image=busybox sh in the master ;
telepresence2 work suddenly!!

thallgren commented 3 years ago

This issue is made obsolete by the fact that Telepresence no longer use ssh.

indrasvat commented 3 years ago

@thallgren, interesting. Should I update telepresence to latest and try the setup again?

thallgren commented 3 years ago

Yes, please do.