kubernetes_service panic runtime error: invalid memory address or nil pointer dereference

programmerq commented 2 years ago

Description

What happened:

When a fresh kubernetes_service pod comes up it seems to be okay. The first time an end user tries to access that corresponding kubernetes cluster, they get an EOF error from kubectl: Error from server: EOF. The proxy_service logs on the teleport cluster also mention an EOF, and the kubernetes_service instance logs the following stacktrace:

2022-01-10 19:57:42.076348 I | http: panic serving 172.18.88.10:35284: runtime error: invalid memory address or nil pointer dereference
goroutine 777 [running]:
net/http.(*conn).serve.func1(0xc0005ded20)
    /opt/go/src/net/http/server.go:1824 +0x153
panic(0x38fe360, 0x66dbcc0)
    /opt/go/src/runtime/panic.go:971 +0x499
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSessionLocal(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1499 +0xc2
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSessionSameCluster(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1454 +0x65
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSession(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1416 +0xf8
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).catchAll(0xc00000c1e0, 0xc00059d860, 0x4d4fd30, 0xc000ac8700, 0xc000928e00, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1214 +0x98
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).withAuthStd.func1(0x4d4fd30, 0xc000ac8700, 0xc000928e00, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:418 +0x165
github.com/gravitational/teleport/lib/httplib.MakeStdHandlerWithErrorWriter.func1(0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/lib/httplib/httplib.go:90 +0x7c
net/http.HandlerFunc.ServeHTTP(0xc000bf6558, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /opt/go/src/net/http/server.go:2069 +0x44
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc00000c1f8, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/julienschmidt/httprouter/router.go:448 +0x1d2
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).ServeHTTP(0xc00000c1e0, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:290 +0x4f
github.com/gravitational/teleport/lib/auth.(*Middleware).ServeHTTP(0xc000269740, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/lib/auth/middleware.go:538 +0x24f
github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP(0xc00016eb80, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/gravitational/oxy/ratelimit/tokenlimiter.go:118 +0x253
github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP(0xc000b87260, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/gravitational/oxy/connlimit/connlimit.go:75 +0x3d9
net/http.serverHandler.ServeHTTP(0xc00087f960, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /opt/go/src/net/http/server.go:2887 +0xa3
net/http.(*conn).serve(0xc0005ded20, 0x4d64eb8, 0xc0001d6780)
    /opt/go/src/net/http/server.go:1952 +0x8cd
created by net/http.(*Server).Serve
    /opt/go/src/net/http/server.go:3013 +0x39b

Subsequent connection attempts make the panic repeat

What you expected to happen:

No panic

Reproduction Steps

Unable to reproduce reliably in other environments at this time.

Use reports that this kubernetes_service instance is not using persistence and starts with a fresh /var/lib/teleport on each invocation. Other kubernetes_service instances in this teleport cluster are unaffected.

Server Details

Teleport version (run teleport version): Teleport Enterprise 7.3.2-0-gaa361fbc1
Server OS (e.g. from /etc/os-release):
Where are you running Teleport? (e.g. AWS, GCP, Dedicated Hardware): kubernetes
Additional details:

Debug Logs

2022-01-10T19:56:11Z INFO             Generating new host UUID: bd4a0778-86b8-4d80-8445-a538d5dbd9c1. service/service.go:631
2022-01-10T19:56:11Z INFO [PROC:1]    Service diag is creating new listener on 0.0.0.0:3000. service/signals.go:213
2022-01-10T19:56:11Z INFO [DIAG:1]    Starting diagnostic service on 0.0.0.0:3000. service/service.go:2134
2022-01-10T19:56:11Z INFO [PROC:1]    Joining the cluster with a secure token. service/connect.go:362
2022-01-10T19:56:11Z INFO [AUTH]      Attempting registration with auth server. auth/register.go:148
2022-01-10T19:56:11Z INFO [AUTH]      Joining remote cluster xxx-bastion-roks-bastion with CA pin. auth/register.go:344
2022-01-10T19:56:11Z INFO [AUTH]      Successfully registered with auth server. auth/register.go:155
2022-01-10T19:56:11Z INFO [PROC:1]    Kube has obtained credentials to connect to the cluster. service/connect.go:392
2022-01-10T19:56:11Z INFO [PROC:1]    The process successfully wrote the credentials and state of Kube to the disk. service/connect.go:433
2022-01-10T19:56:11Z INFO [PROC:1]    Kube: features loaded from auth server: Kubernetes:true App:true OIDC:true SAML:true AccessControls:true AdvancedAccessWorkflows:true HSM:true  service/connect.go:58
2022-01-10T19:56:11Z INFO [KUBERNETE] Cache "kube" first init succeeded. cache/cache.go:658
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/sessions. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/sessions/default. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/streaming. service/service.go:1976
2022-01-10T19:56:11Z INFO [AUDIT:1]   Creating directory /var/lib/teleport/log/upload/streaming/default. service/service.go:1976
2022-01-10T19:56:11Z INFO [PROC:1]    Service kubernetes is creating new listener on 0.0.0.0:3027. service/signals.go:213
2022-01-10T19:56:11Z INFO [KUBERNETE] Starting Kube service on [::]:3027. service/kubernetes.go:275
2022-01-10T19:56:11Z INFO [KUBERNETE] Kubernetes service 7.3.2:v7.3.2-0-gaa361fbc1 is starting on [::]:3027. utils/cli.go:282
[KUBERNETES]   Kubernetes service 7.3.2:v7.3.2-0-gaa361fbc1 is starting on [::]:3027.
2022-01-10 19:57:42.076348 I | http: panic serving 172.18.88.10:35284: runtime error: invalid memory address or nil pointer dereference
goroutine 777 [running]:
net/http.(*conn).serve.func1(0xc0005ded20)
    /opt/go/src/net/http/server.go:1824 +0x153
panic(0x38fe360, 0x66dbcc0)
    /opt/go/src/runtime/panic.go:971 +0x499
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSessionLocal(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1499 +0xc2
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSessionSameCluster(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1454 +0x65
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).newClusterSession(0xc00000c1e0, 0x4da78b8, 0xc000572fc0, 0x4da49d8, 0xc0005dccc0, 0x4cd0840, 0xc000040c00, 0x4cd0840, 0xc000040e00, 0xc000833530, ...)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1416 +0xf8
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).catchAll(0xc00000c1e0, 0xc00059d860, 0x4d4fd30, 0xc000ac8700, 0xc000928e00, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:1214 +0x98
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).withAuthStd.func1(0x4d4fd30, 0xc000ac8700, 0xc000928e00, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:418 +0x165
github.com/gravitational/teleport/lib/httplib.MakeStdHandlerWithErrorWriter.func1(0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/lib/httplib/httplib.go:90 +0x7c
net/http.HandlerFunc.ServeHTTP(0xc000bf6558, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /opt/go/src/net/http/server.go:2069 +0x44
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc00000c1f8, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/julienschmidt/httprouter/router.go:448 +0x1d2
github.com/gravitational/teleport/lib/kube/proxy.(*Forwarder).ServeHTTP(0xc00000c1e0, 0x4d4fd30, 0xc000ac8700, 0xc000928e00)
    /go/src/github.com/gravitational/teleport/lib/kube/proxy/forwarder.go:290 +0x4f
github.com/gravitational/teleport/lib/auth.(*Middleware).ServeHTTP(0xc000269740, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/lib/auth/middleware.go:538 +0x24f
github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP(0xc00016eb80, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/gravitational/oxy/ratelimit/tokenlimiter.go:118 +0x253
github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP(0xc000b87260, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /go/src/github.com/gravitational/teleport/vendor/github.com/gravitational/oxy/connlimit/connlimit.go:75 +0x3d9
net/http.serverHandler.ServeHTTP(0xc00087f960, 0x4d4fd30, 0xc000ac8700, 0xc000928b00)
    /opt/go/src/net/http/server.go:2887 +0xa3
net/http.(*conn).serve(0xc0005ded20, 0x4d64eb8, 0xc0001d6780)
    /opt/go/src/net/http/server.go:1952 +0x8cd
created by net/http.(*Server).Serve
    /opt/go/src/net/http/server.go:3013 +0x39b
...
<panic repeats>

Fswng commented 2 years ago

I have a similar problem(But I don't have permission to change the code): dockerd-current: http: panic serving @: runtime error: invalid memory address or nil pointer dereference#012goroutine 13095674 [running]:#012net/http.(*conn).serve.func1(0x4001ec57c0)#012#011/usr/lib/golang/src/net/http/server.go:1767 +0xfc#012panic(0x14e6c20, 0x25d2560)#012#011/usr/lib/golang/src/runtime/panic.go:679 +0x194#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Route).Match(0x4000406770, 0x4002ae7400, 0x40020e1180, 0x745b00)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/route.go:45 +0x64#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).Match(0x40008f2460, 0x4002ae7400, 0x40020e1180, 0x1)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:58 +0x74#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0x40008f2460, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:92 +0x220#012github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0x4001f4a3f0, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x94#012net/http.serverHandler.ServeHTTP(0x4000408000, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/usr/lib/golang/src/net/http/server.go:2802 +0xc0#012net/http.(*conn).serve(0x4001ec57c0, 0x1a2d6a0, 0x4000ef9c00)#012#011/usr/lib/golang/src/net/http/server.go:1890 +0x714#012created by net/http.(*Server).Serve#012#011/usr/lib/golang/src/net/http/server.go:2927 +0x2f4 kubelet: I0112 00:09:02.476745 45405 client.go:80] Connecting to docker on unix:///var/run/docker.sock kubelet: I0112 00:09:02.476777 45405 client.go:109] Start docker client with request timeout=10m0s kubelet: E0112 00:09:02.477591 45405 kube_docker_client.go:91] failed to retrieve docker version: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/version: EOF kubelet: W0112 00:09:02.477628 45405 kube_docker_client.go:92] Using empty version for docker client, this may sometimes cause compatibility issue. kubelet: Error: failed to run Kubelet: failed to create kubelet: failed to get docker version: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/version: EOF

r0mant commented 2 years ago

We have addressed the panic but weren't able to get the kubeconfig from the customer to see what was causing it in the first place. Feel free to reopen if more info becomes available.

ArunNadda commented 2 years ago

Hi @r0mant , Are there any plans to backport this fix to V7 too?

r0mant commented 2 years ago

@ArunNadda We can if needed. Is anyone interested in a v7 backport?

ArunNadda commented 2 years ago

Hi @r0mant , thanks for quick response. Yes, we have a customer who is running v7 and is getting this nil pointer error. They are still testing v8/v9 (which might take them months to test and deploy in prod), so was checking if we can backport it to v7.

Here is format of kubeconfig they have:

apiVersion: v1
clusters:
- cluster:
    server: ${cluster_server}
  name: ${cluster_name}
contexts:
- context:
    cluster: ${cluster_name}
    user: teleport
  name: ${cluster_name}
current-context: ${cluster_name}
kind: Config
preferences: {}
users:
- name: teleport
  user:
    token: ${SA_TOKEN}

lxea commented 2 years ago

I've made a backport here if this is wanted for v7: https://github.com/gravitational/teleport/pull/12143

r0mant commented 2 years ago

@ArunNadda Thanks, I checked my kubeconfig and it looks like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <redacted>
    server: https://127.0.0.1:52828
  name: mini1
contexts:
- context:
    cluster: mini1
    user: mini1-teleport-sa
  name: mini1
current-context: mini1
kind: Config
preferences: {}
users:
- name: mini1-teleport-sa
  user:
    token: <redacted>

Notice that I also have a certificate-authority-data set for my cluster. In the kubeconfig above it is missing, and it is likely the reason this panic is happening in the first place. Even with the backported fix, Kube access won't work with this kubeconfig, it will just fail more gracefully instead of panicking (they'll get an error trying to run any kubectl command).

I would ask the customer how they're connecting to their Kube cluster and how they generated that kubeconfig. If I had to guess, they're trying to connect to the Kubernetes API server over HTTP instead of HTTPs? What is the value of ${cluster_server}? We do require connecting with TLS.

gravitational / teleport