thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.73k stars 2.04k forks source link

Thanos Sidecar - Flush Endpoint #7295

Open nicolastakashi opened 2 months ago

nicolastakashi commented 2 months ago

Is your proposal related to a problem?

On the Prometheus Operator project, we have implemented the ability to autoscale Prometheus shards when running Prometheus using the agent mode. This might be achieved using the native HPA or any tool like Keda.

For the Prometheus Server, we are still working to make this feature available and a lot of discussion is ongoing yet. For users using Prometheus Server + Thanos Sidecar, I'd like to provide a graceful shutdown by flushing and uploading blocks before shutdown of the Prometheus Server.

I've added the ability to upload blocks to the blob storage using Thanos Tools, we can leverage this on Prometheus Operator by doing some hacks such as a preStop hook to flush tsdb and upload blocks. As we can see we already have feature requests from users implementing this on their own, as you can see here.

This works as expected but it's a bit hacky and I'd like to propose a new feature to the Thanos Sidecar.

Describe the solution you'd like

A new endpoint on the Thanos Sidecar e.g. /flush, /shutdown, or /snapshot (naming is hard 😅). This brand-new endpoint should execute the following logic when invoked.

  1. Invoke Prometheus TSDB Snapshot Endpoint
  2. Upload Snapshot Blocks to the blob storage
  3. Delete the Snapshot dir after upload (I'm not sure about it)

This new endpoint is only providing an easier experience to graceful shutdown Prometheus Server when some of the following operations are required.

  1. Scaling Down Events
  2. Users Decreasing Disk Size
  3. Prometheus Migration from one place to another. (e.g namespace migration)

By having this new endpoint I envision Prometheus Operator calling it before scaling down the instance, and after this endpoint returns success we can just delete the Statefulset and its Persistent volume.

Describe alternatives you've considered

N/A

Additional context

nicolastakashi commented 2 months ago

cc @ArthurSens @simonpasquier

yeya24 commented 2 months ago

I've added the ability to upload blocks to the blob storage using Thanos Tools, we can leverage this on Prometheus Operator by doing some hacks such as a preStop hook to flush tsdb and upload blocks. As we can see we already have feature requests from users implementing this on their own, as you can see https://github.com/prometheus-operator/prometheus-operator/issues/6540.

If Thanos sidecar implements this API, how would you call it from Prometheus operator? Would you still use the preStop hook to call the API or at downscaling event time?

nicolastakashi commented 2 months ago

I've added the ability to upload blocks to the blob storage using Thanos Tools, we can leverage this on Prometheus Operator by doing some hacks such as a preStop hook to flush tsdb and upload blocks. As we can see we already have feature requests from users implementing this on their own, as you can see https://github.com/prometheus-operator/prometheus-operator/issues/6540.

If Thanos sidecar implements this API, how would you call it from Prometheus operator? Would you still use the preStop hook to call the API or at downscaling event time?

The idea is invoke this on the scaling down event.

yeya24 commented 2 months ago

I am not against adding this functionality into Thanos sidecar.

By having this new endpoint I envision Prometheus Operator calling it before scaling down the instance, and after this endpoint returns success we can just delete the Statefulset and its Persistent volume.

Prometheus operator has its own sidecar like the config-reloader. Why this feature cannot be implemented in another binary deployed as a sidecar as well? The code can be maintained in the Prometheus operator project.

nicolastakashi commented 2 months ago

I am not against adding this functionality into Thanos sidecar.

By having this new endpoint I envision Prometheus Operator calling it before scaling down the instance, and after this endpoint returns success we can just delete the Statefulset and its Persistent volume.

Prometheus operator has its own sidecar like the config-reloader. Why this feature cannot be implemented in another binary deployed as a sidecar as well? The code can be maintained in the Prometheus operator project.

My only concern about having this on projects like config-reloader is because this components will need to learn how to upload blocks to storage account.

Besides that I think might be a useful solution for people is not using Prometheus Operator as well, but I don't have any data about it to confirm it's just feeling 😅

Nashluffy commented 2 months ago

As someone having to orchestrate "flushing" outside of prometheus-operator currently, I would definitely leverage a /flush endpoint in thanos. I'd also much prefer it as an in-built command than a separate container like config-reloader.

It'd also be much easier to benefit from incremental progress on this, as I'd be able to remove a lot of my custom glue/wiring in my current setup, instead of having to wait for the changes to propagate all the way to prometheus-operator.

yeya24 commented 2 months ago

As someone having to orchestrate "flushing" outside of prometheus-operator currently, I would definitely leverage a /flush endpoint in thanos. I'd also much prefer it as an in-built command than a separate container like config-reloader.

Once this functionality has been added to the Prometheus operator sidecar, there should be no difference in terms of UX compared to having the API natively?

But I think I am okay to add the new API. PR is welcomed.

MichaHoffmann commented 2 months ago

There was some idea to run prometheus via libraries from within sidecar; that way we could build this properly. cc @SuperQ

Nashluffy commented 1 month ago

There was some idea to run prometheus via libraries from within sidecar; that way we could build this properly. cc @SuperQ

I've raised a PR that flushes via TSDB snapshotting, but happy to give this a try if you can point me in a general direction. IIUC we could change the implementation of the flush to what you proposed in the future without impacting end users, too.