telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.55k stars 515 forks source link

tunelling borked when telepresence is locked in its own namespace #935

Closed sokoow closed 3 years ago

sokoow commented 5 years ago

So, here's my scenario:

  1. I create a brand new namespace
  2. Create some role and serviceaccount, and lock it in this namespace only.
  3. try to run telepresence with kubeconfig with that locked role and namespace, and I get:
$ KUBECONFIG=~/.kube/locked-config telepresence --docker-run -ti ubuntu:18.04 /bin/bash
T: Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.
T: Starting network proxy to cluster using new Deployment telepresence-1551079420-434663-6647

T: No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.

T: Setup complete. Launching your container.
root@telepresence-1551079420-434663-6647-59b469dfdd-nrsnc:/# apt-get update

... and it hangs. Also has a tendency to hang on non-network commands, like ls.

proxy logs on kube cluster don't show much:

$ KUBECONFIG=~/.kube/locked-config kubectl logs telepresence-1551080425-4650092-15525-6797ff8fd5-rfxwh
Listening...
2019-02-25T07:40:54+0000 [-] Loading ./forwarder.py...
2019-02-25T07:40:59+0000 [-] /etc/resolv.conf changed, reparsing
2019-02-25T07:40:59+0000 [-] Resolver added ('10.96.0.10', 53) to server list
2019-02-25T07:40:59+0000 [-] SOCKSv5Factory starting on 9050
2019-02-25T07:40:59+0000 [socks.SOCKSv5Factory#info] Starting factory <socks.SOCKSv5Factory object at 0x7f19084307f0>
2019-02-25T07:40:59+0000 [-] DNSDatagramProtocol starting on 9053
2019-02-25T07:40:59+0000 [-] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7f1908430b70>
2019-02-25T07:40:59+0000 [-] Loaded.
2019-02-25T07:40:59+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 18.9.0 (/usr/bin/python3.6 3.6.5) starting up.
2019-02-25T07:40:59+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
2019-02-25T07:41:30+0000 [Poll#info] Checkpoint

same for local docker proxy logs on the client:

$ docker logs dfc8b89f90d1
[INFO  tini (1)] Spawned child process 'python3' with pid '6'
   0.0 TEL | Telepresence 0+unknown launched at Mon Feb 25 07:30:29 2019
   0.0 TEL |   /usr/bin/entrypoint.py proxy '{"cidrs": ["0/0"], "expose_ports": []}'
   0.0 TEL | Platform: linux
   0.0 TEL | Python 3.6.5 (default, Aug 22 2018, 14:30:18)
   0.0 TEL | [GCC 6.3.0]
   0.0 TEL | [1] Running: uname -a
   0.0   1 | Linux telepresence-1551079810-8083134-10017-5ff98c4699-tfvqn 4.15.0-45-generic #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019 x86_64 Linux
   0.0 TEL | [2] Running: /usr/sbin/sshd -e
   0.0 TEL | [3] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   0.1 TEL | [3] exit 255 in 0.04 secs.
   0.3 TEL | [4] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   0.3 TEL | [4] exit 255 in 0.01 secs.
   0.6 TEL | [5] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   0.6 TEL | [5] exit 255 in 0.01 secs.
   0.8 TEL | [6] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   0.8 TEL | [6] exit 255 in 0.00 secs.
   1.1 TEL | [7] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   1.1 TEL | [7] exit 255 in 0.00 secs.
   1.3 TEL | [8] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
   2.7 TEL | [8] ran in 1.32 secs.
   2.7 TEL | [9] Capturing: netstat -n
   2.7 TEL | Everything launched. Waiting to exit...
   2.8 TEL | BEGIN SPAN runner.py:586(wait_for_exit)
Starting sshuttle proxy.
firewall manager: Starting firewall with Python version 3.6.5
firewall manager: ready method name nat.
IPv6 enabled: False
UDP enabled: False
DNS enabled: True
TCP redirector listening on ('127.0.0.1', 12300).
DNS listening on ('127.0.0.1', 12300).
Starting client with Python version 3.6.5
c : connecting to server...
Warning: Permanently added '[127.0.0.1]:38023' (ECDSA) to the list of known hosts.
Starting server with Python version 3.6.5
 s: latency control setting = True
 s: available routes:
c : Connected.
 s:   2/10.32.0.0/12
firewall manager: setting up.
>> iptables -t nat -N sshuttle-12300
>> iptables -t nat -F sshuttle-12300
>> iptables -t nat -I OUTPUT 1 -j sshuttle-12300
>> iptables -t nat -I PREROUTING 1 -j sshuttle-12300
>> iptables -t nat -A sshuttle-12300 -j RETURN --dest 172.17.0.2/32 -p tcp
>> iptables -t nat -A sshuttle-12300 -j RETURN --dest 172.17.0.1/32 -p tcp
>> iptables -t nat -A sshuttle-12300 -j RETURN --dest 127.0.0.1/32 -p tcp
>> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 0.0.0.0/0 -p tcp --to-ports 12300 -m ttl ! --ttl 42
>> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 10.96.0.10/32 -p udp --dport 53 --to-ports 12300 -m ttl ! --ttl 42
>> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 224.0.0.252/32 -p udp --dport 5355 --to-ports 12300 -m ttl ! --ttl 42
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been deleted.
c : DNS request from ('172.17.0.2', 51563) to None: 78 bytes
c : DNS request from ('172.17.0.2', 39955) to None: 72 bytes
c : DNS request from ('172.17.0.2', 48861) to None: 68 bytes
c : DNS request from ('172.17.0.2', 43649) to None: 71 bytes
c : DNS request from ('172.17.0.2', 37558) to None: 54 bytes

although this part looks worrying:

c : DNS request from ('172.17.0.2', 51563) to None: 78 bytes
c : DNS request from ('172.17.0.2', 39955) to None: 72 bytes
c : DNS request from ('172.17.0.2', 48861) to None: 68 bytes
c : DNS request from ('172.17.0.2', 43649) to None: 71 bytes
c : DNS request from ('172.17.0.2', 37558) to None: 54 bytes

when I exec into the proxy on kube, it has internet connectivity:

$ wget google.com
Connecting to google.com (216.58.213.174:80)
Connecting to www.google.com (216.58.206.228:80)
index.html           100% |*******************************| 11242   0:00:00 ETA

$ arp -a
10-32-0-18.kube-dns.kube-system.svc.cluster.local (10.32.0.18) at 12:b0:66:86:4e:3a [ether]  on eth0
10-32-0-17.kube-dns.kube-system.svc.cluster.local (10.32.0.17) at 12:d2:8a:e6:04:a9 [ether]  on eth0
? (10.44.0.0) at fa:c7:10:8f:4a:41 [ether]  on eth0

so it must be either something with the tunnel, or rbac is too strict - um... help ? :D

ark3 commented 5 years ago

Could you pass along the set of commands you used to create your locked-down config? I'd like to try to reproduce this at my end. Thank you!

sokoow commented 5 years ago

sure, sorry for delay, just got to this:

  1. Here's how you create isolated namespace with rolebinding:
---
apiVersion: v1
kind: Namespace
metadata:
  name: user-2k946n
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: user-2k946n-full-access
  namespace: user-2k946n
rules:
- apiGroups: ["", "extensions", "apps"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["batch"]
  resources:
  - jobs
  - cronjobs
  verbs: ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: user-2k946n-view
  namespace: user-2k946n
subjects:
- kind: ServiceAccount
  name: user-2k946n
  namespace: user-2k946n
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: user-2k946n-full-access
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: user-2k946n
  namespace: user-2k946n
  1. and here's kubeconfig template if you wanted to test it:
apiVersion: v1
kind: Config
preferences: {}

# Define the cluster
clusters:
- cluster:
    certificate-authority-data: BASE64CERT
    # You'll need the API endpoint of your Cluster here:
    server: https://cluster.little:6443
  name: melittlecluster

# Define the user
users:
- name: user-2k946n
  user:
    as-user-extra: {}
    client-key-data: BASE64CERT
    token: BASE64CERT

# Define the context: linking a user to a cluster
contexts:
- context:
    cluster: melittlecluster
    namespace: user-2k946n
    user: user-2k946n
  name: user-2k946n

# Define current context
current-context: user-2k946n

this should be enough to run telepresence through it, and experience problems I faced

sokoow commented 5 years ago

any update ?

donnyyung commented 3 years ago

I believe this is no longer an issue in Telepresence 2. Here are the docs on RBAC with Telepresence: https://www.telepresence.io/docs/latest/reference/rbac/ .Here are the docs on how to install Telepresence (https://www.telepresence.io/docs/latest/install/), please re-open if you still see this issue in our latest version!