solo-io / kubesquash

A debugger for Kubernetes applications.
228 stars 15 forks source link

Error debugging pod #5

Open ghost opened 6 years ago

ghost commented 6 years ago

If I run kubesquash selecting a pod with an example golang container dlv fails to attach to it and outputs the following:

ERROR: logging before flag.Parse: I0803 07:56:21.797484   28708 remote_runtime.go:43] Connecting to runtime service unix:///var/run/cri.sock
ERROR: logging before flag.Parse: E0803 07:56:21.797871   28708 remote_runtime.go:169] ListPodSandbox with filter &PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{io.kubernetes.pod.name: example-microservice-rc-ccfq2,io.kubernetes.pod.namespace: default,},} from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
WARN[0000] ListPodSandbox error                          err="rpc error: code = Unavailable desc = grpc: the connection is unavailable"
FATA[0000] debug failed!                                 err="rpc error: code = Unavailable desc = grpc: the connection is unavailable"
pod is not running and not pending

Am I missing something in my setup?

Environment:

yuval-k commented 6 years ago

Interesting! it seems like squash can't reach the CRI socket. @cog-qlik Can you elaborate on how you deployed kubernetes? do you know what CRI are you using? is it all defaults?

yuval-k commented 6 years ago

To hopefully help this issue, I've release kubesquahs 0.1.4 that supports both kube 1.9 amd 1.10; also, if you are using CRI that's not docker, it can be configured with '-crisock' flag.

Please let me know if that helps

ghost commented 6 years ago

Hi @yuval-k. I just tried again with kubesquash 0.1.4. Now I get a slightly different error:

$ kubesquash
< ... >

ERROR: logging before flag.Parse: I0809 07:20:47.238857   24834 remote_runtime.go:43] Connecting to runtime service /var/run/cri.sock
ERROR: logging before flag.Parse: W0809 07:20:47.238944   24834 util_unix.go:75] Using "/var/run/cri.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/cri.sock".
ERROR: logging before flag.Parse: E0809 07:20:47.239100   24834 remote_runtime.go:434] Status from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
FATA[0000] debug failed!                                 err="rpc error: code = Unavailable desc = grpc: the connection is unavailable"
pod is not running and not pending

It hangs forever without outputting anything

Can you elaborate on how you deployed kubernetes?

My setup is the following:

  1. host OS: ubuntu 16.04
  2. minikube v0.28.0 with the none vm-driver (ie. not inside a VM, but runs kubernetes on my host OS)
  3. docker 17.12.0-ce, build c97c6d6
  4. kubectl
    Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.5", GitCommit:"f01a2bf98249a4db383560443a59bed0c13575df", GitTreeState:"clean", BuildDate:"2018-03-19T15:59:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Do you need some extra info to grasp some more information about the problem?

yuval-k commented 6 years ago

Can you try doing ls /var/run/*sock and tell me what's there (on the same host as minikube). I'm mainly looking to see if /var/run/dockershim.sock is there.

I just tried it in a machine with the none driver and minikube version: v0.28.2, and it seems to work. if /var/run/dockershim.sock is not there, perhaps try with latest minikube?

yuval-k commented 6 years ago

Also, I just updated kubesquash to show the logs of the pod incase of an error. If above doesn't work, can you please try running with the latest version and provide the logs?

ghost commented 6 years ago

Actually I've got minikube v0.28.0, not the v0.28.2 you tested so I'm a bit behind. Btw, here's what I get:

$ ls /var/run/*sock
/var/run/dockershim.sock  /var/run/docker.sock

Running the latest version of kubesquash I get the following:

$ kubesquash
< ... >
ERROR: logging before flag.Parse: I0810 07:34:17.919622   14126 remote_runtime.go:43] Connecting to runtime service /var/run/cri.sock
ERROR: logging before flag.Parse: W0810 07:34:17.919711   14126 util_unix.go:75] Using "/var/run/cri.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/cri.sock".
ERROR: logging before flag.Parse: E0810 07:34:17.919888   14126 remote_runtime.go:434] Status from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
FATA[0000] debug failed!                                 err="rpc error: code = Unavailable desc = grpc: the connection is unavailable"
Pod errored with: <nil>
 Logs:
 ERROR: logging before flag.Parse: I0810 07:34:17.919622   14126 remote_runtime.go:43] Connecting to runtime service /var/run/cri.sock
ERROR: logging before flag.Parse: W0810 07:34:17.919711   14126 util_unix.go:75] Using "/var/run/cri.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/cri.sock".
ERROR: logging before flag.Parse: E0810 07:34:17.919888   14126 remote_runtime.go:434] Status from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
FATA[0000] debug failed!                                 err="rpc error: code = Unavailable desc = grpc: the connection is unavailable"
pod is not running and not pending

I tried with different golang services and I still get the same error.

Just a question; does kubesquash require the golang process to be PID 1 in order to be able to attach delve to it? I tried both containers with the golang process as PID 1 and another one where it's not.

yuval-k commented 6 years ago

re: pid1 kubesquash doesn't strictly require it, but that's the current heuristic it uses. other ideas are welcome! This doesn't seem to be your problem though, in order to find out the pid in the host namespaces, kubesquash tries to communicate with kubelet's CRI. and this seems to fail from some reason. I'm not sure why as dockershim seems to be in order.

Any chance that you can try it with v0.28.2, so we can make sure this bug was not fixed upstream?

ghost commented 6 years ago

pid1 kubesquash doesn't strictly require it, but that's the current heuristic it uses. other ideas are welcome!

This means that, at the moment, kubesquash isn't able to attach delve to a golang process which isn't run as PID 1, but it's not a strict requirement on the "code side" and could be changed. Did I get it correctly?

Any chance that you can try it with v0.28.2, so we can make sure this bug was not fixed upstream?

Looking at the minikube's changelog diff it seems that nothing this big has been fixed. Though I'll destroy and re-create my local cluster with the v0.28.2 version and I'll let you know the outcome.

yuval-k commented 6 years ago

That's correct, the relevant code is here:

https://github.com/solo-io/kubesquash/blob/d1242ccb6fb90c9946cd34bb2888e83df85d6a20/pkg/kube/container.go#L56

As you can see currently it just picks the first pid, but if you have a specific heuristic in mind, let me know.

ghost commented 6 years ago

As you can see currently it just picks the first pid, but if you have a specific heuristic in mind, let me know.

I'll open another issue that can be tagged as "improvement" for this. Being able to attach the debugger to other processes could be of help even though most of the images run the process of interest as PID 1.

brunovlucena commented 5 years ago

I have a similar problem:

✘ ⚡ ⚙ root@k8s1  /srv  kubesquash -crisock unix:///var/run/cri.sock ? Select a namespace default ? Select a pod stock-gen-58d47df5d7-dftq6 ? Going to attach dlv to pod stock-gen-58d47df5d7-dftq6. continue? Yes Pod errored with: Logs: pod is not running and not pending

version: k8s v1.11