neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.28k stars 408 forks source link

page_service: "Tenant X not found" includes a worthless stacktrace #6042

Open koivunej opened 9 months ago

koivunej commented 9 months ago

It's bad if we log a lot of these, might be the reason staging got ratelimited with logging and vmauth.

Instead of fixing this with yet another quick-glue we should just do the rewrite in #5733.

Slack investigation: https://neondb.slack.com/archives/C03438W3FLZ/p1701787912414729?thread_ts=1701712832.744379&cid=C03438W3FLZ

koivunej commented 9 months ago

There might also be an issue of too fast retryes from compute as this is an ongoing "bad setup in staging". So far I haven't been able to view the log enough to understand how often a single tenant is retrying.

Caught one:

2023-12-05T15:17:22.803039Z ERROR page_service_conn_main{peer_addr=10.10.75.146:51310}: query handler for 'pagestream 07ebad77e1803ec2d358524fb8e39c2a 2646df9c64de3ea622fde3574cd622fb' failed: Tenant 07ebad77e1803ec2d358524fb8e39c2a not found
2023-12-05T15:17:22.803455Z ERROR page_service_conn_main{peer_addr=10.10.75.146:51268}: query handler for 'pagestream 07ebad77e1803ec2d358524fb8e39c2a 2646df9c64de3ea622fde3574cd622fb' failed: Tenant 07ebad77e1803ec2d358524fb8e39c2a not found

This is being retried too fast. Created #6043.

jcsp commented 8 months ago

Triage notes: