neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.34k stars 412 forks source link

Pageserver: billing events sent to Vector and S3 should use the same idempotency key #8605

Closed Daniel-ef closed 1 week ago

Daniel-ef commented 1 month ago

Problem

The idempotency key is used to make sure that events are not recorded more than once, although they may be submitted multiple times in some cases to make sure they are not lost.

We should calculate the idempotency key beforehand and send the same events to S3 and Vector.

Relates (internal issue): https://github.com/neondatabase/cloud/issues/9824

Detail

Consumption metrics for billing are written over a socket to an external service (Vector), and also written to S3 for posterity.

In consumption_metrics.rs, we call two output methods with the same vector of metric values:

Each of these ultimately uses RawMetric::as_event on each metric to add an "idempotency key" to the entry: this enables the billing system to receive delta metrics (e.g. data written since last sample) without risking double-counting on retries.

To ensure the S3 output and the Vector output have the same idempotency key, we need to pull the calculation of the keys up into collect_metrics, and pass those with the RawMetrics into each upload function, so that the uploads aren't independently calculating different keys.

jcsp commented 4 weeks ago

Note: If we can eliminate any cases that rely on counter metrics (bytes written), then we can fix this by just settings a constant idempotency key when sending synthetic size.

jcsp commented 4 weeks ago

Checking internally if we can get rid of the counter metric that is the motivation for having idempotency keys to begin with https://neondb.slack.com/archives/C061CPK7UQL/p1723737069420979

jcsp commented 2 weeks ago

Looks like we need the counter metric for the forseeable future.