planetscale / vitess-operator

Kubernetes Operator for Vitess
Apache License 2.0
305 stars 75 forks source link

Catch grpc: addrConn.createTransport exceptions to prevent multiline logs #637

Open bluecrabs007 opened 1 week ago

bluecrabs007 commented 1 week ago

We are seeing go stacktraces from vitess-operator when it cannot connect to a thing. either etcd endpoint or vttablet grpc stack traces look like

W1115 08:55:24.028933       1 component.go:41] [core] [Channel #4 SubChannel #5] grpc: addrConn.createTransport failed to connect to {
  "Addr": "vt-etcd-ec571fc7-client.platform-dbtech-sandbox.svc:2379",
  "ServerName": "vt-etcd-ec571fc7-client.platform-dbtech-sandbox.svc",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 100.74.28.238:2379: connect: connection refused"

and

Unlocking shard sharded/aa-ab for action electShardPrimary with error can't elect primary: didn't find any valid candidate
[core] [Channel #414416878 SubChannel #414416880] grpc: addrConn.createTransport failed to connect to {
 "Addr": ":0",
 "ServerName": "localhost:0",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp :0: connect: connection refused"

Steps to reproduce: Delete etcd pods or vttablet pods

frouioui commented 1 week ago

Hello @bluecrabs007, what exactly is the issue? Is it that the operator spreads the same log message on multiple lines?

Moreover, where are these stack trace coming from, which pod or container?

jwangace commented 1 week ago

Ya the logs are came from single container pod vitess-operator, and seems the logs uses newline character in the end of each line, when it parsed by kibana, it spreads in multiple lines for 1 log. Image