Closed programmerq closed 2 years ago
I have a similar problem(But I don't have permission to change the code):
dockerd-current: http: panic serving @: runtime error: invalid memory address or nil pointer dereference#012goroutine 13095674 [running]:#012net/http.(*conn).serve.func1(0x4001ec57c0)#012#011/usr/lib/golang/src/net/http/server.go:1767 +0xfc#012panic(0x14e6c20, 0x25d2560)#012#011/usr/lib/golang/src/runtime/panic.go:679 +0x194#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Route).Match(0x4000406770, 0x4002ae7400, 0x40020e1180, 0x745b00)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/route.go:45 +0x64#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).Match(0x40008f2460, 0x4002ae7400, 0x40020e1180, 0x1)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:58 +0x74#012github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0x40008f2460, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:92 +0x220#012github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0x4001f4a3f0, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/builddir/build/BUILD/docker-87f2fab3d32f145760b94b87b93daa83e6841ee7/_build/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x94#012net/http.serverHandler.ServeHTTP(0x4000408000, 0x1a224a0, 0x4000312d20, 0x4002ae7400)#012#011/usr/lib/golang/src/net/http/server.go:2802 +0xc0#012net/http.(*conn).serve(0x4001ec57c0, 0x1a2d6a0, 0x4000ef9c00)#012#011/usr/lib/golang/src/net/http/server.go:1890 +0x714#012created by net/http.(*Server).Serve#012#011/usr/lib/golang/src/net/http/server.go:2927 +0x2f4 kubelet: I0112 00:09:02.476745 45405 client.go:80] Connecting to docker on unix:///var/run/docker.sock kubelet: I0112 00:09:02.476777 45405 client.go:109] Start docker client with request timeout=10m0s kubelet: E0112 00:09:02.477591 45405 kube_docker_client.go:91] failed to retrieve docker version: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/version: EOF kubelet: W0112 00:09:02.477628 45405 kube_docker_client.go:92] Using empty version for docker client, this may sometimes cause compatibility issue. kubelet: Error: failed to run Kubelet: failed to create kubelet: failed to get docker version: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/version: EOF
We have addressed the panic but weren't able to get the kubeconfig from the customer to see what was causing it in the first place. Feel free to reopen if more info becomes available.
Hi @r0mant , Are there any plans to backport this fix to V7 too?
@ArunNadda We can if needed. Is anyone interested in a v7 backport?
Hi @r0mant , thanks for quick response. Yes, we have a customer who is running v7 and is getting this nil pointer
error. They are still testing v8/v9 (which might take them months to test and deploy in prod), so was checking if we can backport it to v7.
Here is format of kubeconfig they have:
apiVersion: v1
clusters:
- cluster:
server: ${cluster_server}
name: ${cluster_name}
contexts:
- context:
cluster: ${cluster_name}
user: teleport
name: ${cluster_name}
current-context: ${cluster_name}
kind: Config
preferences: {}
users:
- name: teleport
user:
token: ${SA_TOKEN}
I've made a backport here if this is wanted for v7: https://github.com/gravitational/teleport/pull/12143
@ArunNadda Thanks, I checked my kubeconfig and it looks like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <redacted>
server: https://127.0.0.1:52828
name: mini1
contexts:
- context:
cluster: mini1
user: mini1-teleport-sa
name: mini1
current-context: mini1
kind: Config
preferences: {}
users:
- name: mini1-teleport-sa
user:
token: <redacted>
Notice that I also have a certificate-authority-data
set for my cluster. In the kubeconfig above it is missing, and it is likely the reason this panic is happening in the first place. Even with the backported fix, Kube access won't work with this kubeconfig, it will just fail more gracefully instead of panicking (they'll get an error trying to run any kubectl command).
I would ask the customer how they're connecting to their Kube cluster and how they generated that kubeconfig. If I had to guess, they're trying to connect to the Kubernetes API server over HTTP instead of HTTPs? What is the value of ${cluster_server}
? We do require connecting with TLS.
Description
What happened:
When a fresh kubernetes_service pod comes up it seems to be okay. The first time an end user tries to access that corresponding kubernetes cluster, they get an EOF error from kubectl:
Error from server: EOF
. The proxy_service logs on the teleport cluster also mention an EOF, and the kubernetes_service instance logs the following stacktrace:Subsequent connection attempts make the panic repeat
What you expected to happen:
No panic
Reproduction Steps
Unable to reproduce reliably in other environments at this time.
Use reports that this kubernetes_service instance is not using persistence and starts with a fresh /var/lib/teleport on each invocation. Other kubernetes_service instances in this teleport cluster are unaffected.
Server Details
teleport version
): Teleport Enterprise 7.3.2-0-gaa361fbc1/etc/os-release
):Debug Logs