prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
55.68k stars 9.15k forks source link

Retention time configurable per series (metric, rule, ...). #1381

Open taviLaies opened 8 years ago

taviLaies commented 8 years ago

Hello,

I'm evaluating prometheus as our telemetry platform and I'm looking to see if there's a way to set up graphite-like retention. Let's assume I have a retention period of 15d in prometheus and I define aggregation rules that collapse the samples to 1h aggregates. Is there a way to keep this new metric around for more than 15 days? If this is not possible, could you provide some insight on how you approach historical data in your systems?

Thank you

matthiasr commented 8 years ago

This is not something Prometheus supports directly at the moment and for the foreseeable future. The focus right now is on operational monitoring, i.e. the "here and now".

You can get something like this by using a tiered system. The first-level Prometheus would scrape all the targets and compute the rules. A second-level Prometheus would federate from it, only fetching the result of these rules.

It can do so at a lower resolution, but keep in mind that if you set the scrape_interval to more than 5 minutes your time series will no longer be treated as contiguous. It can also keep them for longer. Theoretically this is only limited by disk space, however again, very long retention is not a focus so YMMV.

Additionally, the second-level Prometheus could use the (experimental) remote storage facilities to push these time series to OpenTSDB or InfluxDB as they are federated in. To query these you will need to use their own query mechanisms, there is no read-back support at the moment.

beorn7 commented 8 years ago

The "5min-problem" is handled by #398. The planned grouping of rules will allow individual evaluation intervals for groups. So something like a "1 hour aggregate" can be configured in a meaningful way.

The piece missing is retention time per series, which I will rename this bug into and make it a feature request. We discussed it several times. It's not a high priority right now, but certainly something we would consider.

klausenbusk commented 8 years ago

A per job retention period is what I need for my use-case.

I pull 4 metric from my solar panel every 30 second, and want to store them forever (so I can for example go 6 months back and see the production at that momemt) but I don't need that for all the other metric (like Prometheus metric).

brian-brazil commented 8 years ago

Prometheus is not intended for indefinite storage, you want #10.

klausenbusk commented 8 years ago

Prometheus is not intended for indefinite storage, you want #10.

I see #10 make sense if you have a lot of time series, but OpenTSDB seems kind of overkill just to store 4 time series forever. Isn't it just a question of allowing people to set retention period to forever? or do you think people will "abuse" that?

brian-brazil commented 8 years ago

We make design decisions that presume that Promtheus data is ephemeral, and can be lost/blown away with no impact.

onorua commented 7 years ago

Coming here from google groups discussion about the same topic I think we could use some per-series retention period for recording rules and metrics it is based upon. We have 3k hosts, which are reporting country they served requests from, we aggregate this values in recording rule, and basically never need raw metrics. But they are using RAM, storage etc.

gouthamve commented 6 years ago

I plan to tackle this today. So essentially it would mean this, regularly calling the delete API and in the background cleaning up the tombstones. Where should this live is the question.

My inclination is that we could leverage the delete API itself and then add a tombstone cleanup API, and add functionality to promtool to call the APIs regularly with the right matchers.

Else, I would need to manipulate the blocks on disk with a separate tool which I must say, I'm not inclined to do.

/cc @brian-brazil @fabxc @juliusv @grobie

fabxc commented 6 years ago

One alternative is to make it part of the tsdb tool and "mount" the tsdb tool under "promtool tsdb", which has other nice benefits. That would make the functionality usable outside of the Prometheus context. Prometheus users would need to run 2 extra commands for disable/enable compaction. Or just wrap those around it when calling via promtool.

On Wed, Nov 22, 2017 at 6:29 AM Goutham Veeramachaneni < notifications@github.com> wrote:

I plan to tackle this today. So essentially it would mean this, regularly calling the delete API and in the background cleaning up the tombstones. Where should this live is the question.

My inclination is that we could leverage the delete API itself and then add a tombstone cleanup API, and add functionality to promtool to call the APIs regularly with the right matchers.

Else, I would need to manipulate the blocks on disk with a separate tool which I must say, I'm not inclined to do.

/cc @brian-brazil https://github.com/brian-brazil @fabxc https://github.com/fabxc @juliusv https://github.com/juliusv @grobie https://github.com/grobie

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/1381#issuecomment-346255896, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuA8tV_IIlR7d8IDAAyISIhpKG06IHaks5s478rgaJpZM4HXqa7 .

gouthamve commented 6 years ago

My concern there is the edge-cases, what if the request to restart compacting fails? While the tsdb tool makes perfect sense on static data, I think it would be cleaner if we could make it an API on top of tsdb.DB that the applications built on top can leverage.

For the tsdb tool case, if we know we are acting on static data then we can instantiate a DB and work with that. We can have two options for promtool delete live and delete static though I highly doubt anybody will be working with static dirs.

Having it as an API also allows us to make it a feature of Prometheus if people care and Brian agrees ;)

brian-brazil commented 6 years ago

I wouldn't object to delete and force cleanup functionality being added to promtool.

I have a general concern that users looking for this tend to be over-optimising and misunderstanding how Prometheus is intended to be used, such as the original post of this issue. I'd also have performance concerns with all this cleanup going on.

krasi-georgiev commented 6 years ago

don't think anything can be done on the tsdb side for this so removed the local storage label.

Doesn't seem there is a big demand for such a use case and since the issue is so old maybe should close it and revisit if it comes up again or if @taviLaies is still interested in this.

csmarchbanks commented 4 years ago

A few of us had discussions around this at KubeCon and find dynamic retention valuable for both Prometheus and Thanos. Generally the approach we were discussing is to include the tool within the Prometheus code as part of compaction, and allow users to define retention with matchers. Design doc will be coming soon, but I am happy to hear any major concerns around compaction time processing sooner than later so I can include them.

brian-brazil commented 4 years ago

Compaction is currently an entirely internal process that's not exposed to users, and in particular does not affect query semantics.

I'd prefer to ensure that we'd expose the delete API via promtool, and let users work themselves from there.

On Fri 22 Nov 2019, 16:25 Chris Marchbanks, notifications@github.com wrote:

A few of us had discussions around this at KubeCon and find dynamic retention valuable for both Prometheus and Thanos. Generally the approach we were discussing is to include the tool within the Prometheus code as part of compaction, and allow users to define retention with matchers. Design doc will be coming soon, but I am happy to hear any major concerns around compaction time processing sooner than later so I can include them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/1381?email_source=notifications&email_token=ABWJG5RNVF2DGM4KKJ4LVYTQU72YLA5CNFSM4B26U252YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE5645Y#issuecomment-557575799, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJG5TUETHIMM6H36RYWNTQU72YLANCNFSM4B26U25Q .

matthiasr commented 4 years ago

I think in this case it's justified to expose this to users – "I want to keep some metrics longer than others" is such a common use case that I don't think we should relegate it to "write your own shell scripts". The impact on query semantics doesn't have to be explicitly bound to compaction – it can simply be "samples will disappear within X hours after they have reached their retention period".

brian-brazil commented 4 years ago

There's a few unrelated things being tied together there. One thing we do know is that users tend to be over-aggressive in their settings, which then causes them significant performance impact. This is why we don't currently have a feature in this area, the last person to investigate it found it to not work out in practice.

is such a common use case that I don't think we should relegate it to "write your own shell scripts".

It'd be a single curl/promtool invocation, so it's not something that even really classifies as a shell script.

matthiasr commented 4 years ago

It would still need to be executed regularly to fulfill the need. So it needs to be scheduled, monitored, updated.

When would the corresponding space be freed?

On Mon, Nov 25, 2019 at 11:30 AM Brian Brazil notifications@github.com wrote:

There's a few unrelated things being tied together there. One thing we do know is that users tend to be over-aggressive in their settings, which then causes them significant performance impact. This is why we don't currently have a feature in this area, the last person to investigate it found it to not work out in practice.

is such a common use case that I don't think we should relegate it to "write your own shell scripts".

It'd be a single curl/promtool invocation, so it's not something that even really classifies as a shell script.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/1381?email_source=notifications&email_token=AABAEBTBGXUAUXYKI7XPUETQVOZNXA5CNFSM4B26U252YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFCCRIY#issuecomment-558114979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAEBUUG7YFKLHJPDI57K3QVOZNXANCNFSM4B26U25Q .

brian-brazil commented 4 years ago

It would still need to be executed regularly to fulfill the need. So it needs to be scheduled, monitored, updated.

Cron covers that largely, plus existing disk space alerting.

When would the corresponding space be freed?

Typically it'd be automatically within 2 hours. Unless they trigger it manually (which is where performance problems tend to come in, this gets triggered far too often).

fche commented 4 years ago

How close can one get to an ideal scenario where a user is not made to worry about what to retain for how long, but instead the system adapts to a storage quota? It could track actual query usage of metrics and their time windows, so it can predict metrics / times that are likely to be unneeded, and prefer them for disposal.

brian-brazil commented 4 years ago

That's not possible, all it takes is one overly broad query and everything gets retained. If you want to try to build something like that it would be best done outside Prometheus.

If you just want to have an overall byte limit, we already have a feature like that.

csmarchbanks commented 4 years ago

I think that ease of use is desirable and worth it for this feature. Otherwise users have to allow the admin APIs, protect them, and standup whatever cron type job they need to use, even if the calls are made easier with promtool. I will add an alternatives section to try to do a more detailed analysis in the doc.

I plan to have compaction continue to be an internal detail of when samples are deleted. Compaction is a convenient time to do the work, and it is already mentioned in the storage documentation that it may take 2 hours for data to be removed.

the last person to investigate it found it to not work out in practice.

Any links on this that you know of? I would love to learn from them.

brian-brazil commented 4 years ago

I think that ease of use is desirable and worth it for this feature.

I question that. Many users seem to be holding on to how things work with other less efficient systems, and where things like Thanos are in play doing it there would make more sense.

it is already mentioned in the storage documentation that it may take 2 hours for data to be removed.

Removed from disk is not the same as no longer accessible to queries, which happens by the time the delete call is complete. Any changes from that would be quite messy semantically, and is a reason not to go deleting recent data regularly.

Any links on this that you know of?

That's @gouthamve posts above back in 2017, which I think he only wrote the results of in IRC. The summary was that forcing compaction every 5 minutes is a very bad idea, so he gave up.

csmarchbanks commented 4 years ago

Sorry, I was unclear. The storage documentation already says that blocks will not get cleaned up for up to two hours after the have exceeded the retention setting. E.g. with a retention of 6 hours, I can still query data from 8 - 10 hours ago.

brian-brazil commented 4 years ago

That's only at the bounds of full retention, and IMHO we should keep the 1.x behaviour of having a consistent time for that. It's not the last few hours of data with typical retention times.

csmarchbanks commented 4 years ago

I have started a design doc for this work here: https://docs.google.com/document/d/1Dvn7GjUtjFlBnxCD8UWI2Q0EiCcBEx_j9eodfUkE8vg/edit?usp=sharing

All comments are appreciated!

kovalev94 commented 4 years ago

Is there any progress on this issue? I had similar problem. I want to monitor total errors count on networks switches, but on some of them there isn't snmp oid for total errors. So i should get different types of errors(CRC, Aligment, Jabber etc.) and calculate sum of them. But i want to keep only total errors, not others.

csmarchbanks commented 4 years ago

No progress to report, there are still many unresolved comments in the design doc I put forward, and I have not had the time or energy required to get consensus. There is some work related to this in Thanos that has been proposed for Google Summer of Code (https://github.com/thanos-io/thanos/issues/903).

If you only need to delete certain well known series, calling the delete series api on a regular schedule is an option.

anthonyeleven commented 4 years ago

Sorry, I was unclear. The storage documentation already says that blocks will not get cleaned up for up to two hours after the have exceeded the retention setting.

FWIW I can live with that. Ceph already behaves sort of this way when deleting RBD volumes.

In my situation, there are metrics that aren't likely to be useful past, say, 30 days like network stats. Others could have value going back for months, eg. certain Ceph capacity, performance, etc. metrics. Ideally I'd love to be able to downsample older metrics - maybe I only need to keep one per day.

Use-case: feeding Grafana dashboards and ad-hoc queries. The federation idea is clever and effective, but would complicate the heck out of queries. I would need to duplicate dashboards across the 2 (or more) datasources which doesn't make for the best user experience, and is prone to divergence.

Rezeye commented 3 years ago

Has this been looked into any further by the development team? Or have any users found any work arounds? This would help me a lot with my dashboards.

dfredell commented 3 years ago

My workaround is to deploy a VictoriaMetrics next to the prometheus. Then configure Victoria to scrape prometheus but filter which metrics to scrape, have different retention, and loose granularity.

Command flags:

      - -retentionPeriod=120 # 120 months
      - -dedup.minScrapeInterval=15m

promscrape.config

scrape_configs:
  - job_name: prometheus
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{__name__="metric1"}'
        - '{__name__="metric2"}'
    static_configs:
      - targets:
        - 'prometheus:9090'
DEvil0000 commented 3 years ago

one workaround is a setup with multiple prometheus services having different configuration (plus/or thanos depending on the scenario)

11.12.2020 16:39:43 Rezeye notifications@github.com:

Has this been looked into any further by the development team? Or have any users found any work arounds? This would help me a lot with my dashboards.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[https://github.com/prometheus/prometheus/issues/1381#issuecomment-743266607], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAZQPLR5BR5Z2NVT3IRGDTTSUI4L3ANCNFSM4B26U25Q]. [###24x24:true###][Tracking-Bild][https://github.com/notifications/beacon/AAZQPLWPWHRPE4QLFFPSQELSUI4L3A5CNFSM4B26U252YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFRGVSLY.gif]

LajosCseppento commented 3 years ago

Has this been looked into any further by the development team? Or have any users found any work arounds? This would help me a lot with my dashboards.

We decided to put in place clean up policy (Prometheus REST API) - default is 12 weeks, but a part is cleaned up after 6 weeks. We considered this cheaper to maintain than several instances.

csmarchbanks commented 3 years ago

Has this been looked into any further by the development team?

There is a topic for a prometheus dev summit to discuss this issue. I am hopeful that we will discuss it either this month or in January, but I cannot say for sure. After the discussion, we will be able to provide a more complete answer as to how we would like this in (or not in) Prometheus.

Or have any users found any work arounds?

The current ways to do this are either with federation to a second Prometheus instance, or having an external process call the admin delete API.

m-yosefpor commented 3 years ago

This would be a really great feature for prometheus. The use case of this feature is not only for long term metrics (which some people argued in the comments that is not the prometheus intend). There are lots of expensive metrics which we want to be able to have them only for a single day, but the rest of the metrics for 15 days. So now we have to operate 2 prometheus instances. Apart from operation, some of our queries needs matching operators to filter the metrics based on the series in the other instance, so it also makes it harder. I know there is thanos option for global query of multiple prometheis, but it is overkill to use it only for not being able to retain some metrics for a shorter time.

bwplotka commented 3 years ago

It would be nice to revisit this 🤗

There are big wins if we have something like this: Prioritizing, Aggregations, satisfying data for alert only and discard etc

cc @csmarchbanks wonder if it's time resurrect your proposal (:

csmarchbanks commented 3 years ago

If it doesn't get discussed at the upcoming dev summit, perhaps let's get a few interested parties together to get it moving without a dev summit? It's been on the agenda with a fair number of votes for quite awhile now so I hope it gets discussed.

csmarchbanks commented 3 years ago

Good news!

There was consensus in today's dev summit that we would like to implement dynamic retention inside of the Prometheus server. The next step is to decide how we would like to implement this feature. Right now it looks like there are two proposals in the document I linked, one for a new format that allows reducing or extending retention based on a set of matchers, and a second building on rule evaluation to delete data that is older than age. Anyone who is interested, please provide feedback on either of those approaches (or a new one) so that implementation work can begin.

FujishigeTemma commented 3 years ago

Hi, I would like to tackle this issue as my GSoC'21 project. For now, I've read through the discussions and docs noted below.

Let me know if there are any other existing discussions I should read.

As a first step, I'm going to do some code readings and figure out the dependencies. And it would be nice to learn through practice, so let me know if there are any related good first issue.

I have several questions but I'm not sure what steps I should take, so I want to decide how to proceed first. I mean we're going to decide on the detailed specifications based on the proposal sooner or later, I want to share the technical assumptions.

csmarchbanks commented 3 years ago

@FujishigeTemma Those are great discussions to start, if you have questions about GSoC feel free to reach out to me via email or in the CNCF slack. Otherwise, part of the GSoC project will be to make sure a design is accepted and then start implementing it.

yeya24 commented 3 years ago

As this project was not selected in GSoC this year, do we have any other updates or progress on this?

roidelapluie commented 3 years ago

As this project was not selected in GSoC this year, do we have any other updates or progress on this?

There is no progress.

yeya24 commented 3 years ago

Based on the design doc https://docs.google.com/document/d/1Dvn7GjUtjFlBnxCD8UWI2Q0EiCcBEx_j9eodfUkE8vg/edit#, for config like:

retention_configs:
- retention: 1w
  matchers:
  - {job=”node”}
- retention: 60d
  matchers:
  - slo_errors_total{job=”my-service”}
  - slo_requests_total{job=”my-service”}
- retention: 2w
  matchers:
  - {job=”my-service”}

An approach would be:

  1. For any retention time < global retention time: in the reloadBlocks method, block.Delete() can be called to add tombstones to each block based on matchers and time. The actual data will be deleted during compaction or clean_tombstones API.
  2. For any retention time > global retention time: before deleting the block, rewrite the block and keep any matched series chunks for longer retention. Matched series will become a new block and the original block will be deleted.

To achieve 2, we can extend the compactor interface with modifiers like https://github.com/prometheus/prometheus/pull/9413:

type Compactor interface {
    // Write persists a Block into a directory.
    // No Block is written when resulting Block has 0 samples, and returns empty ulid.ULID{}.
    Write(dest string, b BlockReader, mint, maxt int64, parent *BlockMeta, modifiers ...Modifier) (ulid.ULID, error)
}
// Modifier modifies the index symbols and chunk series before persisting a new block during compaction.
type Modifier interface {
    Modify(sym index.StringIter, set storage.ChunkSeriesSet, changeLog ChangeLogger) (index.StringIter, storage.ChunkSeriesSet, error)
}

We can define a retention time & matchers aware modifier to only keep the chunkSeries we want or simply use ChunkQuerier to get the ChunkSeriesSet using the given matchers. This approach should work, but performance is a big issue as we have to rewrite blocks every 1 minute.

Implementation for modifier that goes through each series: https://github.com/yeya24/prometheus/blob/experiment-modify/tsdb/modifiers.go#L202-L277 Implementation for modifier that uses chunkQuerier: https://github.com/yeya24/prometheus/blob/experiment-modify/tsdb/modifiers.go#L291-L329

shaoxt commented 2 years ago

@yeya24 Is the implementation going to merge ? What is the stage of the design doc?