telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.55k stars 515 forks source link

Tel proxy crashes when serviceaccount/namespace is unavailable #1059

Closed mbarzilovich closed 3 years ago

mbarzilovich commented 5 years ago

What were you trying to do?

I want to get http response from some service in my GKE cluster telepresence -n tests --run curl -vv http://service//get-config

What did you expect to happen?

curl can connect to existing service in GKE

What happened instead?

(please tell us - the traceback is automatically included, see below. use https://gist.github.com to pass along full telepresence.log)

Automatically included information

Command line: ['/usr/local/bin/telepresence', '-n', 'tests', '--run', 'curl', '-vv', '-d', '', 'http://rest-gateway/v10/get-config'] Version: 0.101 Python version: 3.7.3 (default, Jun 19 2019, 07:38:49) [Clang 10.0.1 (clang-1001.0.46.4)] kubectl version: Client Version: v1.12.2 // Server Version: v1.12.8-gke.7 oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc') OS: Darwin MacBook-Pro-Apple.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64


Background process (SSH port forward (socks and proxy poll)) exited with return code 255. Command was:
  ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 52907 telepresence@127.0.0.1 -L127.0.0.1:52919:127.0.0.1:9050 -R9055:127.0.0.1:52920

Recent output was:
  Connection to 127.0.0.1 closed by remote host.

Background process (sshuttle) exited with return code 99. Command was:
  sshuttle-telepresence -v --dns --method auto -e 'ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null' -r telepresence@127.0.0.1:52907 --to-ns 127.0.0.1:9053 10.44.12.0/24 10.44.8.0/24 10.44.19.0/24 10.44.1.0/24 10.44.10.0/24 10.44.15.0/24 10.44.4.0/24 10.44.14.0/24 10.44.9.0/24 10.44.5.0/24 10.44.18.0/24 10.44.16.0/24 10.44.20.0/24 10.44.13.0/24 10.44.2.0/24 10.130.0.0/20 10.44.0.0/24 10.44.11.0/24 10.44.7.0/24 10.44.6.0/24 10.44.3.0/24

Recent output was:
  return firewall.main(opt.method, opt.syslog)
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/firewall.py", line 207, in main
      socket.AF_INET6, subnets_v6, udp)
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/methods/pf.py", line 447, in setup_firewall
      pf.enable()
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py
ark3 commented 5 years ago

Sorry about the crash. Does this happen every time? Does a similar test using the container method work?

telepresence -n tests --docker-run --rm -it pstauffer/curl -vv http://rest-gateway/v10/get-config

If this happens every time, can you please pass along a copy of telepresence.log (redacted as desired) via a GitHub Gist? Thank you.

mbarzilovich commented 5 years ago

Hi, This command fails as well BTW proxy pod fails to start with the following log

$ kubectl logs tests-5cb864cf7f-xtz6k
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/twisted/application/app.py", line 674, in run
    runApp(config)
  File "/usr/lib/python3.6/site-packages/twisted/scripts/twistd.py", line 25, in runApp
    runner.run()
  File "/usr/lib/python3.6/site-packages/twisted/application/app.py", line 381, in run
    self.application = self.createOrGetApplication()
  File "/usr/lib/python3.6/site-packages/twisted/application/app.py", line 453, in createOrGetApplication
    application = getApplication(self.config, passphrase)
--- <exception caught here> ---
  File "/usr/lib/python3.6/site-packages/twisted/application/app.py", line 464, in getApplication
    application = service.loadApplication(filename, style, passphrase)
  File "/usr/lib/python3.6/site-packages/twisted/application/service.py", line 416, in loadApplication
    application = sob.loadValueFromFile(filename, 'application')
  File "/usr/lib/python3.6/site-packages/twisted/persisted/sob.py", line 177, in loadValueFromFile
    eval(codeObj, d, d)
  File "./forwarder.py", line 62, in <module>
    main()
  File "./forwarder.py", line 53, in main
    with open(NAMESPACE_PATH) as f:
builtins.FileNotFoundError: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/namespace'

Failed to load application: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/namespace'
ark3 commented 5 years ago

Thanks for the additional information. It seems that you have an unusual service account setup in your cluster. I'll need to investigate further.

mbarzilovich commented 5 years ago

I investigated this issue One thing is that service account token is not mounted automatically (in my case) I created new deployment like this

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: proxy
spec:
  replicas: 1  # only one replica
  template:
    metadata:
      labels:
        app: proxy
    spec:
      automountServiceAccountToken: true
      containers:
      - name: proxy
        image: datawire/telepresence-k8s:0.101
        securityContext:
            allowPrivilegeEscalation: true

But it still does not work Here is some logs

Background process (sshuttle) exited with return code 99. Command was:
  sshuttle-telepresence -v --dns --method auto -e 'ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null' -r telepresence@127.0.0.1:54553 --to-ns 127.0.0.1:9053 10.44.2.0/24 10.44.3.0/24 10.44.14.0/24 10.44.0.0/24 10.130.0.0/20 10.44.13.0/24 10.44.15.0/24 10.44.8.0/24 10.44.9.0/24 10.44.7.0/24 10.44.5.0/24 10.44.12.0/24 10.44.10.0/24 10.44.19.0/24 10.44.1.0/24 10.44.6.0/24 10.44.4.0/24 10.44.11.0/24

Recent output was:
  File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/cmdline.py", line 26, in main
      return firewall.main(opt.method, opt.syslog)
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/firewall.py", line 207, in main
      socket.AF_INET6, subnets_v6, udp)
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/methods/pf.py", line 447, in setup_firewall
      pf.enable()
    File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/methods/pf.py", line 321, in enable
      _pf_context['Xtoken'].append(re.search(b'Token : (.+)', o[1]).group(1))
  AttributeError: 'NoneType' object has no attribute 'group'
  c : fatal: cleanup: ['sudo', '-p', '[local sudo] Password: ', 'PYTHONPATH=/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl', '--', '/usr/local/opt/python/bin/python3.7', '/usr/local/Cellar/telepresence/0.101/libexec/sshuttle-telepresence', '-v', '--method', 'auto', '--firewall'] returned 1

Here are the last few lines of the logfile (see /Users/mbarzilovich/Work/repo/telepresence.log for the complete logs):

   9.6  21 |   File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/methods/pf.py", line 447, in setup_firewall
   9.6  21 |     pf.enable()
   9.6  21 |   File "/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl/sshuttle/methods/pf.py", line 321, in enable
   9.6  21 |     _pf_context['Xtoken'].append(re.search(b'Token : (.+)', o[1]).group(1))
   9.6  21 | AttributeError: 'NoneType' object has no attribute 'group'
   9.6  21 | c : fatal: cleanup: ['sudo', '-p', '[local sudo] Password: ', 'PYTHONPATH=/Users/mbarzilovich/.pex/install/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl.73b9c6a0c49d6b6bf7533478d454b7be51b9d990/sshuttle_telepresence-0.78.2.dev45+gd250ccb-py2.py3-none-any.whl', '--', '/usr/local/opt/python/bin/python3.7', '/usr/local/Cellar/telepresence/0.101/libexec/sshuttle-telepresence', '-v', '--method', 'auto', '--firewall'] returned 1
   9.7 TEL | [21] sshuttle: exit 99
  10.5 TEL | (proxy checking local liveness)
  10.5 TEL | [42] exit 1 in 1.29 secs.
  10.5 TEL | [43] Capturing: python3 -c 'import socket; socket.gethostbyname("hellotelepresence-10.a.sanity.check.telepresence.io")'
  10.6  13 | 2019-06-28T10:00:09+0000 [Poll#info] Checkpoint
  10.9 TEL | [43] exit 1 in 0.39 secs.
mbarzilovich commented 5 years ago

I checked --method inject-tcp it works fine for common utils like curl etc But it does not work for running maven tests i guess java binary is static linked

ark3 commented 5 years ago

I suspect you're running afoul of System Integrity Protection on your Mac. Somewhere between your mvn command and the eventual launch of java, some tool gets run (like /bin/sh) that discards the inject-tcp magic due to SIP. Can you try running java directly? Some users have reported success with OpenJDK Java on the Mac using inject-tcp within the last year or so.

donnyyung commented 3 years ago

I believe this is no longer an issue in Telepresence 2. Here are the docs on how to install Telepresence (https://www.telepresence.io/docs/latest/install/), please re-open if you still see this issue in our latest version!