kube-reporting / metering-operator

The Metering Operator is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.
Apache License 2.0
339 stars 86 forks source link

Import Prometheus metrics(old data) #640

Open JooyoungJeong opened 5 years ago

JooyoungJeong commented 5 years ago

Hi. How can I import Prometheus old metrics ?

I want to import the prometheus historical data into datasource.

ex) If prometheus is still collecting data from January 1st, then I can collect data from the moment the scrap starts when I start scrap on March 18th. However, I would like to put the data from January 1st into datasource. What should I do?

Thank you .

chancez commented 5 years ago

Prometheus doesn't collect historical data. It only collects data for the current time. It does retain old metric data however.

We currently have a few processes for importing data, or for collecting data for different periods, but we currently don't document this to users because it's changing fairly regularly and we're unsure of how we want to handle historical data imports currently.

I understand this is a very useful and important feature, but there's a lot of possibility to do this wrongly and get duplicated data in your database and produce incorrect reports.

There is 2 main approaches.

We currently have an HTTP API which supports being pushed metrics, which is something we have for using in tests, so we can test against known datasets. This would let you directly add whatever you want to the ReportDataSources, but the problem is the input isn't something you can get easily. The API accepts the output of another API we have which lets you get the underlying metrics from a ReportDataSource as JSON.

The other way is we have an HTTP API which allows you to trigger a collection of ReportDataSources manually, allowing you to specify the time range to import data for. This approach currently needs work; as you cannot specify a specific ReportDataSource, and you still need to manually edit the ReportDataSource status to indicate what range of data the ReportDataSource has.

If your interested in one of these approaches we can look into formalizing this process and documenting how to use them.

jaxxstorm commented 5 years ago

I'm interested in this 👋

JooyoungJeong commented 5 years ago

@chancez Thank you for your feedback!! I would like to proceed with putting data from mariaDB or Prometheus into the DataSource. Putting MariaDB's data in Datasource is going to be registered as another issue.

diogormatas commented 4 years ago

Hi,

I have a related use case that need something like "batch imports", until as I know and research, there is no feature for doing that, am i right?

My use case is:

I have batches of data being sent to relational database from 10min to 10min, and i want to send this 10min batch into prometheus database.

Since 17 fev 2019 this feature has been requested in 535

peteanusergiu commented 4 years ago

Any chance we can get access, with some examples, to the push metrics APIs?

sichvoge commented 4 years ago

Hi. Just trying to understand the desired outcome.

Do you guys want to be able to generate reports from a certain timeframe rather than "now"?

peteanusergiu commented 4 years ago

Yes. We want to visualise our "now" data but also have, in the same visualisation, the "past" data. For that, I would go through our historic data and generate the metrics with a past date.

huang195 commented 4 years ago

I'm interested in exactly the same feature, i.e., putting older data into prometheus to visualize it in grafana. However, it's not exactly importing, but rather relying on a scrape target that gradually gives old metrics data (with custom timestamp). This is described here: https://groups.google.com/forum/#!topic/prometheus-users/BUY1zx0K8Ms. But the blocker seems to be prometheus doesn't allow custom timestamp that is older than 1 hour. How do I remove this limitation?

cryply commented 4 years ago

Ability to insert missed data in past would be very helpfui

chargio commented 4 years ago

I am trying to understand better the use case, as I am confused by the use of Prometheus here.

What is the source of the old data? Prometheus will not have the data. Are you thinking on a connection that will consume old data stored in some other format?

Metering already provides a long term storage, so you can have more data than that provided in Prometheus. You can get reports on long term data (i.e monthly data is needed to gererate montly reports)

Is the reason to get the data into Prometheus to be able to show it into Grafana? We are thinking on connecting the operator to Grafana so you can use it directly.

utdrmac commented 4 years ago

I'm going to jump in here and explain our use-case that needs this feature. Assume for the moment that for whatever reason, I cannot run a Prometheus server in a client's environment. Additionally, the client environment is blocked in accessing the public internet. I still want to collect metrics data for these servers (and visualize it using Grafana, for example). My only possible solution, it would seem, is to write a custom exporter that saves the metrics to some file format that I can then transfer (say after 24-36hrs of collecting) to a Prometheus server which can import that data to be used with my visualizer. Is Prometheus capable of such data ingestion? If not, what would be an appropriate workaround to getting the metrics data into Prom? @chargio @chancez

malanoga commented 3 years ago

is there a possible way to push data from CSV or any other way with an old timestamp (from 2000-2008) in Prometheus to read it in that interval?

labroid commented 3 years ago

Here's are my use cases: 1) I have metrics that support SLAs (Service Level Agreements) to a customer. When I change to Prometheus for tracking, I would like to be able to 'upload' historic data to the beginning of the SLA period so the data is in one graph/database 2) I have sensor data from the past year that feeds downstream analytics; when migrating to Prometheus I'd like to be able to put the historic data into the Prometheus database so the downstream analytics have a single endpoint

utdrmac commented 3 years ago

@malanoga @labroid We recently switched to https://github.com/VictoriaMetrics/VictoriaMetrics which is a "clone" of Prometheus and it allows for back-filling of data along with other import options like CSV.

labroid commented 3 years ago

@utdrmac - VictoriaMetrics looks pretty awesome, and supports several methods for backfilling older data. Thanks for the pointer!

RWF-N commented 2 years ago

Any recent movement on this request?

Our use case:

We have Grafana widgets that show timelines for metrics from Prometheus, and we also do ad-hoc queries using the Prometheus web interface.

nahsi commented 2 years ago

Any recent movement on this request?

Our use case:

  • We have mobile remote devices that run Prometheus.
  • We have a central management system that runs Prometheus and uses federation to scrape metrics from the remote devices.
  • The remote devices do not always have connectivity. Since federation scrapes, we lose the metrics for the period where the connection to the remote device was down.
  • We would like a method where the first "scrape" after comms are restored retrieves all data since the last successful "scrape".

We have Grafana widgets that show timelines for metrics from Prometheus, and we also do ad-hoc queries using the Prometheus web interface.

You should use Mimir and push metrics from remote Prometheus to it with remote_write

aspyct commented 2 years ago

I would also very much like the ability to ingest older data, but I understand why that may not be part of the features here.

However, because it's documented in the exposition formats that you can specify a timestamp, I built a whole infrastructure counting on this. Now that I finally need it, saying that I'm disappointed would be an understatement. I literally wasted days and weeks on this.

May I suggest you add a note in the exposition formats documentation to warn people about this?

caio-vinicius commented 1 year ago

@nahsi, can you please give more details in how to do that in that specific scenario of @RWF-N. I'm with the same problem here, but a bit different:

I manage some physical high availability servers and I'm collecting data that is stored inside that same high availability server. Data is stored on that same server and if it goes down, the data will stop being scraped and consequently saved. When the server comes up again, I would like to be able to grab the data since the last time no more data was grabbed. Is this possible with Grafana Mimir?

Anyway, I'm thinking of uploading a Prometheus in a safer external place and using it as a data source, together with Prometheus that can lose data.

nahsi commented 1 year ago

@nahsi, can you please give more details in how to do that in that specific scenario of @RWF-N. I'm with the same problem here, but a bit different:

I manage some physical high availability servers and I'm collecting data that is stored inside that same high availability server. Data is stored on that same server and if it goes down, the data will stop being scraped and consequently saved. When the server comes up again, I would like to be able to grab the data since the last time no more data was grabbed. Is this possible with Grafana Mimir?

Anyway, I'm thinking of uploading a Prometheus in a safer external place and using it as a data source, together with Prometheus that can lose data.

Mimir with S3 storage backend and running inside orchestrator like Nomad.

Self-hosting S3 solutions like Minio have high-available mode.

This way you will have minimal downtime (if using Mimir in HA mode - zero downtime).

Prometheus has a buffer of two hours. If Mimir is down for less than two hours Prometheus will retry sending buffered metrics when Mimir is up. https://prometheus.io/docs/practices/remote_write/