solo-io / squash

The debugger for microservices
https://squash.solo.io
Apache License 2.0
1.74k stars 103 forks source link

The plank pod couldn't be started up with CRI-O and debugging couldn't be enabled in VS Code #258

Open brightzheng100 opened 3 years ago

brightzheng100 commented 3 years ago

Briefing Trying the demo app process by following the official docs. But it kept failing so I checked on the pods and found that the plank pod was not starting up properly beause of errors:

transport: Error while dialing dial unix /var/run/cri.sock: connect: connection refused

Details

$ kubectl get pod -n squash-debugger
NAME         READY   STATUS   RESTARTS   AGE
planktrdth   0/1     Error    0          6m54

$ kubectl logs -n squash-debugger plankdnxmz
INFO[0000] plank 0.5.18
{"level":"warn","ts":1607000138.3069575,"logger":"squash","caller":"plank/config.go:32","msg":"these are the debug values ingested by plank%!(EXTRA string=debugNamespace, string=default, string=daName, string=ktn8a9xcos, string=plankName, string=plankdnxmz)"}
E1203 12:55:38.326187    5841 remote_runtime.go:482] Status from runtime service failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/run/cri.sock: connect: connection refused"
rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/run/cri.sock: connect: connection refused"
{"level":"fatal","ts":1607000138.326412,"caller":"plank/main.go:27","msg":"debug failed!","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /var/run/cri.sock: connect: connection refused\"","stacktrace":"main.main\n\t/workspace/gopath/src/github.com/solo-io/squash/cmd/plank/main.go:27\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201"}

Env

$ kubectl version --short Client Version: v1.19.4 Server Version: v1.19.4

brightzheng100 commented 3 years ago

Well, just found that squash doesn't support cri-o I guess as after I changed the CRI to default, which should be Docker or containerd, it made more progress.

But more issues came:

  1. I tried accessing the app before enabling the debugging, all fine -- how to access the app should be documented as well, otherwise the user has to dig into the code and prepare the payload:
curl -X POST -L $(minikube service --url example-microservice-svc)/calculate -d '{"Op1":1,"Op2":2, "isAdd":false}'
  1. I started debugging and a pod name plankXXXX got created in namespace squash-debugger, which was great.
$ kubectl get pod -n squash-debugger
NAME         READY   STATUS    RESTARTS   AGE
plankq64vk   1/1     Running   0          8m34s
  1. So I started debugging, all looked good, so I tried to access it like usual but it just hung there forever -- the breakpoint couldn't be hit but just hung there. The logs:
$ kubectl logs -n squash-debugger plankq64vk -f
INFO[0000] plank 0.5.18
{"level":"warn","ts":1607003419.1735222,"logger":"squash","caller":"plank/config.go:32","msg":"these are the debug values ingested by plank%!(EXTRA string=debugNamespace, string=default, string=daName, string=u7vevrv1l4, string=plankName, string=plankq64vk)"}
v2
DEBU[0000] Cri GetPid called                             attachment="metadata:<name:\"u7vevrv1l4\" namespace:\"default\" resource_version:\"1907\" labels:<key:\"container_name\" value:\"example-microservice\" > labels:<key:\"pod_name\" value:\"example-microservice-rc-h9brt\" > labels:<key:\"pod_namespace\" value:\"default\" > > status:<> plank_name:\"plankq64vk\" debugger:\"dlv\" image:\"itstarting/squash-demo:0.1\" pod:\"example-microservice-rc-h9brt\" container:\"example-microservice\" debug_namespace:\"default\" "
DEBU[0000] Cri GetPid ListPodSandbox                     inpod="(*v1alpha2.PodSandboxFilter)(0xc0005521e0)(&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{io.kubernetes.pod.name: example-microservice-rc-h9brt,io.kubernetes.pod.namespace: default,},})\n"
DEBU[0000] Cri GetPid ListContainers                     incont="(*v1alpha2.ContainerFilter)(0xc0000d28a0)(&ContainerFilter{Id:,State:nil,PodSandboxId:81efdf8cf675c674bf199a236f427a76aa0eba3f1e853671cad4785232e35b3b,LabelSelector:map[string]string{io.kubernetes.container.name: example-microservice,},})\n"
DEBU[0000] Cri GetPid ListContainers - got response      respcont="([]*v1alpha2.Container) (len=2 cap=2) {\n (*v1alpha2.Container)(0xc0003b0c60)(&Container{Id:08a91ff65f5f19821ff18aaecadfedb1ad9815fb2c0894124f971e1fb32ccf5b,PodSandboxId:81efdf8cf675c674bf199a236f427a76aa0eba3f1e853671cad4785232e35b3b,Metadata:&ContainerMetadata{Name:example-microservice,Attempt:1,},Image:&ImageSpec{Image:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,},ImageRef:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,State:CONTAINER_RUNNING,CreatedAt:1607003384000000000,Labels:map[string]string{io.kubernetes.container.name: example-microservice,io.kubernetes.pod.name: example-microservice-rc-h9brt,io.kubernetes.pod.namespace: default,io.kubernetes.pod.uid: 2718d723-0b6d-4ad3-b30f-307c39352bed,},Annotations:map[string]string{io.kubernetes.container.hash: 972aebe0,io.kubernetes.container.ports: [{\"containerPort\":8080,\"protocol\":\"TCP\"}],io.kubernetes.container.restartCount: 1,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},}),\n (*v1alpha2.Container)(0xc0003b0d20)(&Container{Id:bcc7b40c1cf1b9e54dcf184638fcebdebabfe7bf5e7b9fe9c8c6cc19784f3fad,PodSandboxId:81efdf8cf675c674bf199a236f427a76aa0eba3f1e853671cad4785232e35b3b,Metadata:&ContainerMetadata{Name:example-microservice,Attempt:0,},Image:&ImageSpec{Image:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,},ImageRef:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,State:CONTAINER_EXITED,CreatedAt:1607002984000000000,Labels:map[string]string{io.kubernetes.container.name: example-microservice,io.kubernetes.pod.name: example-microservice-rc-h9brt,io.kubernetes.pod.namespace: default,io.kubernetes.pod.uid: 2718d723-0b6d-4ad3-b30f-307c39352bed,},Annotations:map[string]string{io.kubernetes.container.hash: 972aebe0,io.kubernetes.container.ports: [{\"containerPort\":8080,\"protocol\":\"TCP\"}],io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},})\n}\n"
DEBU[0000] Cri GetPid ListContainers - filtered response  containers="([]*v1alpha2.Container) (len=1 cap=1) {\n (*v1alpha2.Container)(0xc0003b0c60)(&Container{Id:08a91ff65f5f19821ff18aaecadfedb1ad9815fb2c0894124f971e1fb32ccf5b,PodSandboxId:81efdf8cf675c674bf199a236f427a76aa0eba3f1e853671cad4785232e35b3b,Metadata:&ContainerMetadata{Name:example-microservice,Attempt:1,},Image:&ImageSpec{Image:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,},ImageRef:sha256:13fc819da41d3863cd23f1c184f30de5ccb3429c969d842f54dbbea5809c81e1,State:CONTAINER_RUNNING,CreatedAt:1607003384000000000,Labels:map[string]string{io.kubernetes.container.name: example-microservice,io.kubernetes.pod.name: example-microservice-rc-h9brt,io.kubernetes.pod.namespace: default,io.kubernetes.pod.uid: 2718d723-0b6d-4ad3-b30f-307c39352bed,},Annotations:map[string]string{io.kubernetes.container.hash: 972aebe0,io.kubernetes.container.ports: [{\"containerPort\":8080,\"protocol\":\"TCP\"}],io.kubernetes.container.restartCount: 1,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},})\n}\n"
INFO[0000] found some pids                               potentialpids="[17102]"
about to serve
DEBU[0000] StartDebugServer called                       pid=17102
DEBU[0000] dlv command                                   args="[dlv attach 17102 --listen=127.0.0.1:0 --accept-multiclient=true --api-version=2 --headless --log]" cmd="&{/go/bin/dlv [dlv attach 17102 --listen=127.0.0.1:0 --accept-multiclient=true --api-version=2 --headless --log] []  <nil> 0xc0000a6008 0xc0000a6010 [] <nil> <nil> <nil> <nil> <nil> false [] [] [] [] <nil> <nil>}"
DEBU[0000] starting headless dlv for user started, trying to get port
API server listening at: 127.0.0.1:39305
INFO[0000] attaching to pid 17102                        layer=debugger
DEBU[0002] got netlink msg                               fields.msg="{{92 2 0 1 17425} [254 255 255 255 72 0 0 0 20 0 1 3 1 0 0 0 0 0 0 0 2 6 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]}"
WARN[0002] Netlink error                                 errval=4294967294
WARN[0002] GetListeningSocketsFor: Can't get listening sockets with netlink  err="netlink error: 2" pid=17480
DEBU[0002] GetSocketInodesFor: got listening sockets     inoddedPorts="[{86057 39305}]" pid=17480
DEBU[0002] GetSocketInodesFor: got sockets for pid       pid=17480 sockets="[86057]"
DEBU[0002] GetSocketInodesFor: testing socket match      inodesock=86057 socket="{86057 39305}"
INFO[0002] port found                                    pid=17480 port=39305
INFO[0003] created breakpoint: &api.Breakpoint{ID:1, Name:"", Addr:0x64d75c, File:"/Users/brightzheng/workspaces/kubernetes/squash/demo-app/main.go", Line:23, FunctionName:"main.calchandler", Cond:"", Tracepoint:false, TraceReturn:false, Goroutine:false, Stacktrace:0, Variables:[]string(nil), LoadArgs:(*api.LoadConfig)(0xc00115c330), LoadLocals:(*api.LoadConfig)(0xc00115c360), HitCount:map[string]uint64{}, TotalHitCount:0x0}  layer=debugger
INFO[0003] created breakpoint: &api.Breakpoint{ID:2, Name:"", Addr:0x64d88b, File:"/Users/brightzheng/workspaces/kubernetes/squash/demo-app/main.go", Line:34, FunctionName:"main.calchandler", Cond:"", Tracepoint:false, TraceReturn:false, Goroutine:false, Stacktrace:0, Variables:[]string(nil), LoadArgs:(*api.LoadConfig)(0xc00115c480), LoadLocals:(*api.LoadConfig)(0xc00115c4b0), HitCount:map[string]uint64{}, TotalHitCount:0x0}  layer=debugger
DEBU[0003] continuing                                    layer=debugger
  1. So I thought it may have hit my breakpoint internally and yep, it was true that the squash sounded like have hit the endpoint as after I cleared out all endpoints, the access wouldn't hang anymore -- but again, it just couldn't enable VS Code's debugging process.

Maybe I've missed something or done something wrongly. Comments?