tricorder-observability / Starship

Starship: next-generation Observability platform built with eBPF+WASM
GNU Affero General Public License v3.0
164 stars 25 forks source link

api-server start crashed #160

Closed oowl closed 1 year ago

oowl commented 1 year ago

Describe the bug API Server pid-collctor submodule nil pointer crash in the metadata service's PID information collecting API Server and agents are built from the main line head

To Reproduce Follow helm-charts instructions to deploy Starship on minikube; you should then see API Server start crashing

Screenshots If applicable, add screenshots to help explain your problem.

image

Additional context Add any other context about the problem here.

nascentcore-eng commented 1 year ago

@owl-ltt Do you have any update on this?

oojimmy commented 1 year ago

I have checked agent crash log

{"file":"src/agent/deployer/deployer.go:125","function":"StartModuleDeployLoop","level":"fatal","msg":"Failed to read stream from DeplyModule(), error: rpc error: code = Unknown desc = while handling agent grpc request, failed to update node agent state, error: while handling Agent grpc request, failed to save new online agent, error: ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE constraint","time":"2023-03-09T16:44:45Z"}
oojimmy commented 1 year ago

and also I have try to add debug code to pid_collector module to print debug error msg image

image

oojimmy commented 1 year ago

So this panic was caused by grpc stream closing, but our code has not handled this error.