google / exposure-notifications-server

Exposure Notification Reference Server | Covid-19 Exposure Notifications
https://www.google.com/covid19/exposurenotifications/
Apache License 2.0
2.43k stars 311 forks source link

don't register stackdriver metrics on every instance start #1548

Closed mikehelmick closed 3 years ago

mikehelmick commented 3 years ago

TL;DR

Registering on every instance start causes us to get throttled on the stackdriver API and can cause errors on startup.

sethvargo commented 3 years ago

I think there's a few ways we could do this:

  1. Move the metrics registration process into its own binary that we execute as part of the build/deploy steps. This ensures it happens once per deploy (which is what we really want). I think I prefer this approach.

  2. Move the metrics creation back into Terraform. It started this way, but became unsustainable to keep them in sync. Now that metrics creation is rare, we could move it back. It would be ugly, since we'd have to terraform import all the existing metrics.

  3. Establish a lower-level connection to Redis and write the current build-id somewhere. If no value exists, do the registration. This has some complex edge cases. It makes connecting to Redis a SPOF for startups, but we already have that. Another challenge is that we can't use a nice client here, because we don't establish the client until after metrics have been registered (since the client emits metrics).

Other ideas?

mikehelmick commented 3 years ago

I was thinking something along the lines of 1

Separate binary, but deployed as a service. So when it's deployed, and the container starts, the metrics will register.

sethvargo commented 3 years ago

Ah - and then set min/max instances to 1?

sethvargo commented 3 years ago

https://github.com/google/exposure-notifications-server/pull/1549