Closed esorey closed 4 years ago
Your Mac ends up in some sort of broken state such that DNS resolution of cluster resources with vpn-tcp fails, and you have to restart the machine to fix things. Is that right? The next time this happens, could you please link to a gist of telepresence.log? Please try curl, dig, etc., so the DNS lookups are visible in the log.
If you have to redact the logfile contents, it may be easier to extract the sshuttle
and pod logs. See the trace below for an example.
Thank you for your help!
$ telepresence
Starting proxy with method 'vpn-tcp', which has the following limitations: All processes are affected, only one telepresence can run per machine, and you can't use other VPNs. You may need to add cloud hosts with --also-proxy. For a full list of method limitations see https://telepresence.io/reference/methods.html
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.
No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.
Guessing that Services IP range is 10.3.240.0/20. Services started after this point will be inaccessible if are outside this range; restart telepresence if you can't access a new Service.
@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$ curl -sk https://kubernetes/api/
{
"kind": "APIVersions",
"versions": [
"v1"
],
"serverAddressByClientCIDRs": [
{
"clientCIDR": "0.0.0.0/0",
"serverAddress": "35.184.xx.xx"
}
]
}@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$
@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$ dig kubernetes
; <<>> DiG 9.9.7-P3 <<>> kubernetes
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1260
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;kubernetes. IN A
;; ANSWER SECTION:
kubernetes. 0 IN A 10.3.240.1
;; Query time: 109 msec
;; SERVER: 2001:558:feed::1#53(2001:558:feed::1)
;; WHEN: Fri Apr 06 09:14:20 EDT 2018
;; MSG SIZE rcvd: 44
@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$ host kubernetes
kubernetes has address 10.3.240.1
kubernetes has address 10.3.240.1
Host kubernetes not found: 3(NXDOMAIN)
@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$
@gke_datawireio_us-central1-a_telepresence-testing|bash-4.4$ exit
exit
$ fgrep "Launching: sshuttle" telepresence.log
20.5 TEL | [35] Launching: sshuttle-telepresence -v --dns --method auto -e 'ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -F /dev/null' --to-ns 127.0.0.1:9053 -r telepresence@localhost:58044 10.0.125.0/24 10.0.126.0/24 10.3.240.0/20 10.0.127.0/24 10.0.19.0/24
$ fgrep "35 |" telepresence.log
20.8 35 | Starting sshuttle proxy.
21.0 35 | firewall manager: Starting firewall with Python version 3.6.5
[...]
$ fgrep logs telepresence.log
12.7 TEL | [25] Launching: kubectl --context gke_datawireio_us-central1-a_telepresence-testing --namespace default logs -f telepresence-1523020367-938933-52094-1060178697-lcx4v --container telepresence-1523020367-938933-52094
$ fgrep "25 |" telepresence.log
19.4 25 | Listening...
19.4 25 | 2018-04-06T13:13:09+0000 [-] Loading ./forwarder.py...
19.4 25 | 2018-04-06T13:13:10+0000 [-] /etc/resolv.conf changed, reparsing
19.4 25 | 2018-04-06T13:13:10+0000 [-] Resolver added ('10.3.240.10', 53) to server list
[...]
40.6 25 | 2018-04-06T13:13:31+0000 [stdout#info] A query: b'kubernetes'
40.6 25 | 2018-04-06T13:13:31+0000 [stdout#info] AAAA query, sending back A instead: b'kubernetes'
40.6 25 | 2018-04-06T13:13:31+0000 [stdout#info] A query: b'kubernetes'
40.6 25 | 2018-04-06T13:13:31+0000 [stdout#info] Result for b'kubernetes' is ['10.3.240.1']
40.6 25 | 2018-04-06T13:13:31+0000 [stdout#info] Result for b'kubernetes' is ['10.3.240.1']
Your Mac ends up in some sort of broken state such that DNS resolution of cluster resources with vpn-tcp fails, and you have to restart the machine to fix things. Is that right?
That's correct.
Here's my results from playing around with dig/curl/host and digging through logs. Let me know if any more info would be helpful, and thank you for looking into this!
$ ./dev/telepresence-global.sh
Starting proxy with method 'vpn-tcp', which has the following limitations: All processes are affected, only one telepresence can run per machine, and you can't use other VPNs. You may need to add cloud hosts with --also-proxy. For a full list of method limitations see https://telepresence.io/reference/methods.html
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.
No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.
Password:
Guessing that Services IP range is 100.64.0.0/13. Services started after this point will be inaccessible if are outside this range; restart telepresence if you can't access a new Service.
##### Dig Results
$ dig kafka-kube-staging-1.us-east-1.iris.internal
; <<>> DiG 9.8.3-P1 <<>> kafka-kube-staging-1.us-east-1.iris.internal
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50978
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;kafka-kube-staging-1.us-east-1.iris.internal. IN A
;; ANSWER SECTION:
kafka-kube-staging-1.us-east-1.iris.internal. 0 IN A 172.30.148.226
;; Query time: 155 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Fri Apr 6 12:55:25 2018
;; MSG SIZE rcvd: 78
####### Host Results
$ host kafka-kube-staging-1.us-east-1.iris.internal
kafka-kube-staging-1.us-east-1.iris.internal has address 172.30.148.226
kafka-kube-staging-1.us-east-1.iris.internal has address 172.30.148.226
####### curl Results
$ curl -v kafka-kube-staging-1.us-east-1.iris.internal
* Rebuilt URL to: kafka-kube-staging-1.us-east-1.iris.internal/
* Could not resolve host: kafka-kube-staging-1.us-east-1.iris.internal
* Closing connection 0
curl: (6) Could not resolve host: kafka-kube-staging-1.us-east-1.iris.internal
####### Log digging
$ fgrep "sshuttle" telepresence.log
10.5 TL | BEGIN SPAN vpn.py:200(connect_sshuttle)
19.1 TL | [37] Launching: ['sshuttle-telepresence', '-v', '--dns', '--method', 'auto', '-e', 'ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -F /dev/null', '--to-ns', '127.0.0.1:9053', '-r', 'telepresence@localhost:52760', '100.96.8.0/24', '172.30.150.42', '172.30.132.164', '100.96.2.0/24', '172.30.131.115', '100.96.3.0/24', '100.96.5.0/24', '172.30.32.205', '172.30.132.100', '172.30.148.64', '100.96.6.0/24', '54.211.179.120', '10.51.206.104', '172.30.132.185', '172.30.131.235', '172.30.130.60', '172.30.149.60', '100.96.0.0/24', '100.96.10.0/24', '100.96.9.0/24', '100.64.0.0/13', '172.30.150.180', '172.30.148.226', '52.7.76.172', '172.30.131.64', '172.30.132.168', '172.30.131.98', '54.158.43.211', '172.30.149.71', '100.96.4.0/24', '172.30.32.180', '172.30.130.168']...
19.1 TL | BEGIN SPAN vpn.py:242(connect_sshuttle,sshuttle-wait)
19.2 37 | Starting sshuttle proxy.
21.2 37 | >> pfctl -a sshuttle6-12300 -f /dev/stdin
21.2 37 | >> pfctl -a sshuttle-12300 -f /dev/stdin
22.5 TL | END SPAN vpn.py:242(connect_sshuttle,sshuttle-wait) 3.4s
22.5 TL | END SPAN vpn.py:200(connect_sshuttle) 12.0s
750.0 37 | >> pfctl -a sshuttle6-12300 -F all
750.0 37 | >> pfctl -a sshuttle-12300 -F all
754.8 TL | 12.0s vpn.py:200(connect_sshuttle)
754.8 TL | 3.4s vpn.py:242(connect_sshuttle,sshuttle-wait)
$ fgrep "37 |" telepresence.log
19.2 37 | Starting sshuttle proxy.
19.5 37 | firewall manager: Starting firewall with Python version 3.6.4
19.5 37 | firewall manager: ready method name pf.
19.5 37 | IPv6 enabled: True
19.5 37 | UDP enabled: False
19.5 37 | DNS enabled: True
19.5 37 | TCP redirector listening on ('::1', 12300, 0, 0).
19.5 37 | TCP redirector listening on ('127.0.0.1', 12300).
19.5 37 | DNS listening on ('::1', 12300, 0, 0).
19.5 37 | DNS listening on ('127.0.0.1', 12300).
19.5 37 | Starting client with Python version 3.6.4
19.5 37 | c : connecting to server...
20.1 37 | Warning: Permanently added '[localhost]:52760' (ECDSA) to the list of known hosts.
21.2 37 | Starting server with Python version 3.6.1
21.2 37 | s: latency control setting = True
21.2 37 | s: available routes:
21.2 37 | c : Connected.
21.2 37 | firewall manager: setting up.
21.2 37 | >> pfctl -s Interfaces -i lo -v
21.2 37 | >> pfctl -s all
21.2 37 | >> pfctl -a sshuttle6-12300 -f /dev/stdin
21.2 37 | >> pfctl -E
21.2 37 | >> pfctl -s Interfaces -i lo -v
21.2 37 | >> pfctl -s all
21.2 37 | >> pfctl -a sshuttle-12300 -f /dev/stdin
21.2 37 | >> pfctl -E
21.3 37 | c : DNS request from ('192.168.1.33', 49634) to None: 37 bytes
22.4 37 | c : DNS request from ('192.168.1.33', 59263) to None: 37 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 29883) to None: 38 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 11938) to None: 33 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 19856) to None: 37 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 38563) to None: 35 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 57887) to None: 43 bytes
70.4 37 | c : DNS request from ('192.168.1.33', 40002) to None: 42 bytes
70.6 37 | c : DNS request from ('192.168.1.33', 23155) to None: 32 bytes
70.6 37 | c : DNS request from ('192.168.1.33', 18296) to None: 32 bytes
[...]
749.9 37 | Connection to localhost closed by remote host.
750.0 37 | >> pfctl -a sshuttle6-12300 -F all
750.0 37 | >> pfctl -X 15307307430862952187
750.0 37 | >> pfctl -a sshuttle-12300 -F all
750.0 37 | >> pfctl -X 15307307430862955067
$ fgrep logs telepresence.log
3.9 TL | [11] Launching: ['kubectl', '--context', 'kube.us-east-1.iris.tv', '--namespace', 'dev', 'logs', '-f', 'telepresence-1523044023-793812-2163-846fb6bc95-tkcvx', '--container', 'telepresence-1523044023-793812-2163']...
$ fgrep "11 |" telepresence.log
8.7 11 | Listening...
8.7 11 | 2018-04-06T19:47:12+0000 [-] Loading ./forwarder.py...
8.7 11 | 2018-04-06T19:47:13+0000 [-] /etc/resolv.conf changed, reparsing
8.7 11 | 2018-04-06T19:47:13+0000 [-] Resolver added ('100.64.0.10', 53) to server list
[...]
70.5 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'adservice.google.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'apis.google.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'clients5.google.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'fonts.gstatic.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'lh3.googleusercontent.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'notifications.google.com'
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'apis.google.com' is ['172.217.7.206']
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'adservice.google.com' is ['172.217.12.226']
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'fonts.gstatic.com' is ['172.217.15.99']
70.6 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'notifications.google.com' is ['172.217.7.206']
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'clients5.google.com' is ['172.217.3.46']
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'www.google.com'
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'ogs.google.com'
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'ssl.gstatic.com'
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'lh3.googleusercontent.com' is ['172.217.13.225']
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'www.google.com' is ['172.217.9.196']
70.7 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'ssl.gstatic.com' is ['216.58.217.131']
70.8 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'ogs.google.com' is ['172.217.7.206']
70.8 11 | 2018-04-06T19:48:15+0000 [stdout#info] A query: b'www.gstatic.com'
70.8 11 | 2018-04-06T19:48:15+0000 [stdout#info] Result for b'www.gstatic.com' is ['172.217.15.67']
73.4 11 | 2018-04-06T19:48:18+0000 [stdout#info] A query: b'kubernetes'
73.4 11 | 2018-04-06T19:48:18+0000 [stdout#info] getaddrinfo error: [Errno -2] Name does not resolve
78.2 11 | 2018-04-06T19:48:22+0000 [stdout#info] A query: b'cuscochromeextension-pa.googleapis.com'
78.2 11 | 2018-04-06T19:48:22+0000 [stdout#info] Result for b'cuscochromeextension-pa.googleapis.com' is ['172.217.15.106', '172.217.15.74', '172.217.13.234', '172.217.13.74', '172.217.12.234', '172.217.9.202', '172.217.8.10', '172.217.7.138', '172.217.5.234']
78.3 11 | 2018-04-06T19:48:23+0000 [stdout#info] A query: b'www.googleapis.com'
78.3 11 | 2018-04-06T19:48:23+0000 [stdout#info] Result for b'www.googleapis.com' is ['172.217.15.74', '172.217.13.234', '172.217.13.74', '172.217.12.234', '172.217.9.202', '172.217.8.10', '172.217.7.138', '172.217.5.234', '172.217.15.106']
82.9 11 | 2018-04-06T19:48:27+0000 [stdout#info] A query: b'github.com'
82.9 11 | 2018-04-06T19:48:27+0000 [stdout#info] Result for b'github.com' is ['192.30.253.112', '192.30.253.113']
83.3 11 | 2018-04-06T19:48:28+0000 [stdout#info] A query: b'avatars2.githubusercontent.com'
83.3 11 | 2018-04-06T19:48:28+0000 [stdout#info] Result for b'avatars2.githubusercontent.com' is ['151.101.32.133']
131.7 11 | 2018-04-06T19:49:16+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
131.7 11 | 2018-04-06T19:49:16+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
156.0 11 | 2018-04-06T19:49:40+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
156.0 11 | 2018-04-06T19:49:40+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
156.1 11 | 2018-04-06T19:49:40+0000 [stdout#info] AAAA query, sending back A instead: b'kafka-kube-staging-1.us-east-1.iris.internal'
156.1 11 | 2018-04-06T19:49:40+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
156.1 11 | 2018-04-06T19:49:40+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
156.2 11 | 2018-04-06T19:49:40+0000 [stdout#info] 15 query: b'kafka-kube-staging-1.us-east-1.iris.internal'
156.2 11 | 2018-04-06T19:49:40+0000 [DNSDatagramProtocol (UDP)] DNSDatagramProtocol starting on 46492
156.2 11 | 2018-04-06T19:49:40+0000 [DNSDatagramProtocol (UDP)] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fedb0770b38>
156.2 11 | 2018-04-06T19:49:40+0000 [-] (UDP Port 46492 Closed)
156.2 11 | 2018-04-06T19:49:40+0000 [-] Stopping protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fedb0770b38>
187.9 11 | 2018-04-06T19:50:12+0000 [stdout#info] A query: b'tasks.google.com'
187.9 11 | 2018-04-06T19:50:12+0000 [stdout#info] Result for b'tasks.google.com' is ['172.217.7.174']
205.2 11 | 2018-04-06T19:50:29+0000 [stdout#info] A query: b'adservice.google.com'
205.3 11 | 2018-04-06T19:50:29+0000 [stdout#info] A query: b'apis.google.com'
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'clients5.google.com'
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'fonts.gstatic.com'
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'lh3.googleusercontent.com'
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'notifications.google.com'
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'fonts.gstatic.com' is ['172.217.9.195']
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'notifications.google.com' is ['172.217.12.238']
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'apis.google.com' is ['172.217.12.238']
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'clients5.google.com' is ['172.217.15.78']
205.3 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'lh3.googleusercontent.com' is ['172.217.7.193']
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'adservice.google.com' is ['172.217.15.98']
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'www.google.com'
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'ogs.google.com'
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'ssl.gstatic.com'
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] A query: b'www.gstatic.com'
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'www.google.com' is ['172.217.7.196']
205.4 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'ssl.gstatic.com' is ['172.217.9.195']
205.5 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'ogs.google.com' is ['172.217.7.174']
205.5 11 | 2018-04-06T19:50:30+0000 [stdout#info] Result for b'www.gstatic.com' is ['172.217.15.99']
211.1 11 | 2018-04-06T19:50:35+0000 [stdout#info] A query: b'stackoverflow.com'
211.1 11 | 2018-04-06T19:50:35+0000 [stdout#info] A query: b'www.googleapis.com'
211.1 11 | 2018-04-06T19:50:35+0000 [stdout#info] Result for b'stackoverflow.com' is ['151.101.1.69', '151.101.65.69', '151.101.129.69', '151.101.193.69']
211.1 11 | 2018-04-06T19:50:35+0000 [stdout#info] Result for b'www.googleapis.com' is ['172.217.3.42', '216.58.217.106', '172.217.15.106', '172.217.15.74', '172.217.13.234', '172.217.13.74', '172.217.12.234', '172.217.9.202', '172.217.8.10', '172.217.7.202', '172.217.7.170', '172.217.7.138', '172.217.5.234']
211.4 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'cdn.sstatic.net'
211.4 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'i.stack.imgur.com'
211.4 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'i.stack.imgur.com' is ['104.16.111.18', '104.16.110.18', '104.16.109.18', '104.16.108.18', '104.16.112.18']
211.5 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'cdn.sstatic.net' is ['151.101.65.69', '151.101.129.69', '151.101.193.69', '151.101.1.69']
211.6 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'js-sec.indexww.com'
211.7 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'www.gravatar.com'
211.7 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'www.gravatar.com' is ['192.0.73.2']
211.7 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'js-sec.indexww.com' is ['23.36.33.160']
211.8 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'clients1.google.com'
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'clc.stackoverflow.com'
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'sb.scorecardresearch.com'
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'pixel.quantserve.com'
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'clients1.google.com' is ['172.217.13.78']
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'www.google-analytics.com'
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'clc.stackoverflow.com' is ['151.101.129.69', '151.101.65.69', '151.101.1.69', '151.101.193.69']
211.9 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'www.google-analytics.com' is ['172.217.15.110']
212.0 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'sb.scorecardresearch.com' is ['96.16.79.82']
212.0 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'pixel.quantserve.com' is ['66.150.118.33', '66.150.118.29', '66.150.118.26', '66.150.118.22', '66.150.118.60', '66.150.118.56', '66.150.118.50', '66.150.118.45']
212.2 11 | 2018-04-06T19:50:36+0000 [stdout#info] A query: b'stats.g.doubleclick.net'
212.2 11 | 2018-04-06T19:50:36+0000 [stdout#info] Result for b'stats.g.doubleclick.net' is ['173.194.204.154', '173.194.204.155', '173.194.204.156', '173.194.204.157']
217.9 11 | 2018-04-06T19:50:42+0000 [stdout#info] A query: b'clients4.google.com'
217.9 11 | 2018-04-06T19:50:42+0000 [stdout#info] Result for b'clients4.google.com' is ['172.217.13.78']
238.0 11 | 2018-04-06T19:51:02+0000 [stdout#info] A query: b'play.google.com'
238.0 11 | 2018-04-06T19:51:02+0000 [stdout#info] A query: b'clients6.google.com'
238.1 11 | 2018-04-06T19:51:02+0000 [stdout#info] Result for b'clients6.google.com' is ['172.217.13.78']
238.1 11 | 2018-04-06T19:51:02+0000 [stdout#info] Result for b'play.google.com' is ['172.217.15.78']
258.9 11 | 2018-04-06T19:51:23+0000 [stdout#info] A query: b'calendar.google.com'
258.9 11 | 2018-04-06T19:51:23+0000 [stdout#info] Result for b'calendar.google.com' is ['172.217.15.110']
282.9 11 | 2018-04-06T19:51:47+0000 [stdout#info] A query: b'lh3.googleusercontent.com'
282.9 11 | 2018-04-06T19:51:47+0000 [stdout#info] Result for b'lh3.googleusercontent.com' is ['172.217.15.97']
308.9 11 | 2018-04-06T19:52:13+0000 [stdout#info] A query: b'tasks.google.com'
308.9 11 | 2018-04-06T19:52:13+0000 [stdout#info] Result for b'tasks.google.com' is ['172.217.15.110']
418.8 11 | 2018-04-06T19:54:03+0000 [stdout#info] A query: b'play.google.com'
418.8 11 | 2018-04-06T19:54:03+0000 [stdout#info] Result for b'play.google.com' is ['172.217.15.110']
436.1 11 | 2018-04-06T19:54:20+0000 [stdout#info] A query: b'github.com'
436.1 11 | 2018-04-06T19:54:20+0000 [stdout#info] Result for b'github.com' is ['192.30.253.113', '192.30.253.112']
436.7 11 | 2018-04-06T19:54:21+0000 [stdout#info] A query: b'collector.githubapp.com'
436.7 11 | 2018-04-06T19:54:21+0000 [stdout#info] A query: b'www.google-analytics.com'
436.7 11 | 2018-04-06T19:54:21+0000 [stdout#info] A query: b'api.github.com'
436.7 11 | 2018-04-06T19:54:21+0000 [stdout#info] Result for b'www.google-analytics.com' is ['216.58.217.174']
436.8 11 | 2018-04-06T19:54:21+0000 [stdout#info] Result for b'collector.githubapp.com' is ['52.22.67.147', '54.236.197.250', '34.203.158.5']
436.8 11 | 2018-04-06T19:54:21+0000 [stdout#info] Result for b'api.github.com' is ['192.30.253.117', '192.30.253.116']
438.8 11 | 2018-04-06T19:54:23+0000 [stdout#info] A query: b'clients4.google.com'
438.8 11 | 2018-04-06T19:54:23+0000 [stdout#info] Result for b'clients4.google.com' is ['172.217.7.238']
466.0 11 | 2018-04-06T19:54:50+0000 [stdout#info] A query: b's-usc1c-nss-225.firebaseio.com'
466.0 11 | 2018-04-06T19:54:50+0000 [stdout#info] Result for b's-usc1c-nss-225.firebaseio.com' is ['35.201.97.85']
478.7 11 | 2018-04-06T19:55:03+0000 [stdout#info] A query: b'www.notion.so'
478.7 11 | 2018-04-06T19:55:03+0000 [stdout#info] Result for b'www.notion.so' is ['104.25.151.102', '104.25.152.102']
500.2 11 | 2018-04-06T19:55:24+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
500.3 11 | 2018-04-06T19:55:25+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
536.8 11 | 2018-04-06T19:56:01+0000 [stdout#info] A query: b'play.google.com'
537.0 11 | 2018-04-06T19:56:01+0000 [stdout#info] Result for b'play.google.com' is ['172.217.7.174']
568.1 11 | 2018-04-06T19:56:32+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
568.1 11 | 2018-04-06T19:56:32+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
568.2 11 | 2018-04-06T19:56:32+0000 [stdout#info] AAAA query, sending back A instead: b'kafka-kube-staging-1.us-east-1.iris.internal'
568.2 11 | 2018-04-06T19:56:32+0000 [stdout#info] A query: b'kafka-kube-staging-1.us-east-1.iris.internal'
568.2 11 | 2018-04-06T19:56:32+0000 [stdout#info] Result for b'kafka-kube-staging-1.us-east-1.iris.internal' is ['172.30.148.226']
568.3 11 | 2018-04-06T19:56:32+0000 [stdout#info] 15 query: b'kafka-kube-staging-1.us-east-1.iris.internal'
568.3 11 | 2018-04-06T19:56:32+0000 [DNSDatagramProtocol (UDP)] DNSDatagramProtocol starting on 20284
568.3 11 | 2018-04-06T19:56:32+0000 [DNSDatagramProtocol (UDP)] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fedb075d828>
568.3 11 | 2018-04-06T19:56:32+0000 [-] (UDP Port 20284 Closed)
568.3 11 | 2018-04-06T19:56:32+0000 [-] Stopping protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fedb075d828>
658.9 11 | 2018-04-06T19:58:03+0000 [stdout#info] A query: b'$'
658.9 11 | 2018-04-06T19:58:03+0000 [stdout#info] AAAA query, sending back A instead: b'$'
658.9 11 | 2018-04-06T19:58:03+0000 [stdout#info] A query: b'$'
658.9 11 | 2018-04-06T19:58:03+0000 [stdout#info] getaddrinfo error: [Errno -2] Name does not resolve
658.9 11 | 2018-04-06T19:58:03+0000 [stdout#info] getaddrinfo error: [Errno -2] Name does not resolve
659.1 11 | 2018-04-06T19:58:03+0000 [stdout#info] A query: b'host'
659.1 11 | 2018-04-06T19:58:03+0000 [stdout#info] AAAA query, sending back A instead: b'host'
659.1 11 | 2018-04-06T19:58:03+0000 [stdout#info] A query: b'host'
659.2 11 | 2018-04-06T19:58:03+0000 [stdout#info] getaddrinfo error: [Errno -2] Name does not resolve
659.2 11 | 2018-04-06T19:58:03+0000 [stdout#info] getaddrinfo error: [Errno -2] Name does not resolve
732.9 11 | 2018-04-06T19:59:17+0000 [stdout#info] A query: b'play.google.com'
732.9 11 | 2018-04-06T19:59:17+0000 [stdout#info] Result for b'play.google.com' is ['172.217.8.14']
The curl
issue seems to be IPv6-related. Can you try curl -4
to avoid IPv6 name resolution?
Getting this same issue with curl
and wget
, curl -4
doesn't work. Insanely awesome tool btw!
It would seem using the FQN resolves fine. so my-custom-service.default.svc.cluster.local
but my-custom-service
does not. Both work fine in dig, but the short name fails everywhere else (e.g. chrome, JVM, curl, wget)
Sorry, the curl -4
suggestion was based on a misunderstanding on my part.
Can you send (via gist) the full telepresence.log and command line output for a simple command
telepresence --run curl -svk https://kubernetes/api/
that presumably fails when your computer is in this broken state?
One other thought... Maybe there is an mDNSResponder issue? Can you try the diagnostic portion of this StackOverflow answer?
Hi again,
Thanks for your patience; this issue is intermittent so I can't prod at it as often as I'd like. Here's the log I got from running the above command: https://gist.github.com/esorey/b50b90e9b2ddd50a909bc81d0eaa78a5
@alexisvincent Thanks for the trace. You have found a bug in our fix for #192, released in Telepresence 0.81. Can you try this
TELEPRESENCE_VERSION=0.78 telepresence --run curl -svk https://kubernetes/api/
and send me the trace if it fails? I expect that one will work for you by avoiding the new bug #578.
@esorey Thanks for the trace. You're running 0.77 (I think), which does not have the bug identified above. Oddly, your trace does not have relevant portion of the kubectl logs
output, which is process 10 in that trace. In any case, let me fix the regression and then ask you to try again. Thanks for your patience.
The command you provided now works, however my original problem persists. I can't hit my service.
Here are the logs after running TELEPRESENCE_VERSION=0.78 telepresence --run curl -svk http://retracted-service-name
with the following std out:
Starting proxy with method 'vpn-tcp', which has the following limitations: All processes are affected, only one telepresence can run per machine, and you can't use other VPNs. You may need to add cloud hosts with --also-proxy. For a full list of method limitations see https://telepresence.io/reference/methods.html
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.
No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.
Password:
* Rebuilt URL to: http://retracted-service-name/
* Trying 10.63.255.122...
* TCP_NODELAY set
* Connected to retracted-service-name (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: retracted-service-name
> User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
> Accept: */*
> Referer:
>
* Empty reply from server
* Connection #0 to host retracted-service-name left intact
Perhaps a silly question, but does that work from the cluster?
kubectl run asdf -it --rm --image=fedora --restart=Never -- curl -svk http://retracted-service-name
Yes, sorry, should have mentioned that. Works in the cluster. Also, dig retracted-service-name
resolves correctly, and telepresence --run curl -svk http://retracted-service-name.default.svc.cluster.local
also works.
Do I misunderstand? You're saying curl
works with the full name but not with the short name? Do they resolve to different IPs?
Exactly. Both resolve to same IP with dig
. But only the long one works with curl, netcat, etc.
If you want to play around with it to reproduce, I'm happy to give you teamviewer access to my machine and cluster.
That's confusing.
I'd like to release Telepresence to get the fix for #578 out there. That avoids environment shenanigans and version skew. Let's debug after that.
Thanks very much for helping me with this.
Cool :) NP, this is a really awesome project 👍 I'm thinking of using it as the default user experience for folks interacting with our research cluster at Stellenbosch University. Anything we can do to lower the barrier to entry for folks.
Let me know if there's anything I can help with for the debugging process.
@alexisvincent @esorey Can you please try again with Tel 0.82? Thanks for your help.
working for me 🎉 :) Thanks
Thanks for getting this out! I'm trying it, and I'll let you know if the issue pops back up again.
Unfortunately I'm still getting this issue intermittently. Next time it happens I'll post some logs here.
@rhs pointed out that this issue might be due to negative DNS caching. If you run your curl
without Telepresence and get a DNS failure, MacOS caches that failure for a little while. During that time the curl
will fail even under Telepresence because of the cache.
Can you try clearing your Mac's DNS cache? sudo killall -HUP mDNSResponder
should do it. Then try your Telepresence command again.
Still no dice, unfortunately.
Im seeing much the same as described here on Ubuntu. I have to restart my computer nearly every time in order to get telepresence to work again
Hmm, that's surprising @el-davo. My solution was to dual-boot Ubuntu; since I made the switch, this issue has disappeared for me.
I had the same issue with my mac running Mac OSX 10.14 (Mojave). When using vpn-tcp
method and communicating to the locally running docker-for-desktop kubernetes instance, hosts such as myservice.mynamespace.svc.cluster.local
resolved just fine, but myservice.mynamespace
were not.
Flushing Mac's own DNS service helped per https://help.dreamhost.com/hc/en-us/articles/214981288-Flushing-your-DNS-cache-in-Mac-OS-X-and-Linux , but everything was slowed down for a while after it. For recent Mac OS (Mojave at least) flushing cache is
sudo killall -HUP mDNSResponder;sudo killall mDNSResponderHelper;sudo dscacheutil -flushcache
I think, that just dscacheutil -flushcache
was enough at least once, but I didn't keep records for it.
@amarchen What is the search
line in your computer's /etc/resolv.conf
? Or, if you prefer not to reveal that, how many entries are there? I have some ideas around the particular failure mode you described.
My /etc/resolve.conf
is very minimal. Here's the content (I masked a exact values with "x.x" and "mycoworkingplacedomain"):
$ cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
# scutil --dns
#
# SEE ALSO
# dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
domain mycoworkingplacedomain
nameserver 10.51.x.x
Thanks. So, not what I was thinking, at least in your case. This needs more thought.
Thanks. So, not what I was thinking, at least in your case. This needs more thought.
It's a pity, @ark3 Well, if you happen to figure what sort of studying would help you, I'd be glad to try that. Logs, experiments, trials - whatever you need :)
I'm opening a telepresence session to our K8s cluster using method
vpn-tcp
without swapping out any deployments just to access K8s resources. Roughly half of the time I do this, it works perfectly. The other times, I get errors complaining that DNS resolution of the K8s addresses failed. However, when I rundig <K8s-IP>
, it reports status NOERROR, so I know that there is no real issue with the addresses themselves. The only workaround I've found thus far is to restart my machine entirely. I've also tried the same setup on Linux and have not seen the issue after many runs. Specifically, this is on macOS Sierra 10.12.6. This seems like a particularly hairy issue that may be solved by the plans to run DNS locally, but I still wanted to document it.Thank you!