prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
55.63k stars 9.15k forks source link

Add mechanism to perform bulk imports #535

Closed juliusv closed 3 years ago

juliusv commented 9 years ago

Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on https://github.com/prometheus/prometheus/issues/481.

EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.

RichiH commented 8 years ago

Just as a random data point, I have several (I think valid) use cases for bulk imports.

grypyrg commented 8 years ago

+1

foic commented 8 years ago

This would be very useful for me as well. I understand that this could be used to blur the line between a true time-series event store and prometheus' communicated focus as a representation of recent monitoring state.

Its beneficial In the case where prometheus is the ingestion point for a fan out system involving influxdb or federated rollup prometheus nodes - this would allow me to just simply keep pumping all data through the prometheus entry points without having to have two input paths in the case where the data feed is delayed.

brian-brazil commented 8 years ago

@foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.

foic commented 8 years ago

Thanks @brian-brazil - this is pretty much an expected response :-) Sounds like there is too much to change to make all the pieces work with historical data. Alert manager, rollup etc etc.

Should this feature request be closed then if working with historical data is too difficult?

On 15 February 2016 at 10:02, Brian Brazil notifications@github.com wrote:

@foic https://github.com/foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.

— Reply to this email directly or view it on GitHub https://github.com/prometheus/prometheus/issues/535#issuecomment-183998667 .

brian-brazil commented 8 years ago

@foic What you're requesting is different to what this feature request is mainly about.

Baughn commented 8 years ago

There's some value to bulk import, even in a world where 'storage' isn't the intended purpose of Prometheus. For example...

Recently I've been working on a Prometheus configuration for a certain forum. Although some of the metrics are from PHP, most of the really useful ones are being exported by an nginx logtailer I wrote.

In order to quickly iterate on possible metrics, prior to putting it in production--that hasn't happened yet--I added code to the logtailer that lets it read logs with a time-offset, pausing between each record until it's "supposed" to happen. That's okay-ish, but it'd be much nicer if I could bulk import an entire day's worth of logs at once without actually waiting a day. Then I could look at the result, clear the DB, and try again.

There's the timestamp hack, but none of the client libraries support timestamps, and it's ugly anyway. I haven't tried to use it.

jinxcat commented 8 years ago

@Baughn what do you mean by "timestamp hack"?

I have use for a bulk import endpoint as well, and that's for back-filling data that was interrupted/unavailable on the normal time flow.

Overall, I feel it might be somewhat on the border of what is the intended model of prometheus, but there will always be people with the need to diverge from the ideal setup or situation.

brian-brazil commented 8 years ago

that's for back-filling data that was interrupted/unavailable on the normal time flow.

That's also not what this issue is about. This issue covers brand new data, with nothing newer than it in the database. It's also not backfilling data, which is when there's nothing older than it in the database.

We've never even discussed this variant.

Baughn commented 8 years ago

@Baughn what do you mean by "timestamp hack"?

The /metrics format allows specifying a timestamp in addition to the values. None of the clients support this, and Prometheus doesn't support adding values that are any older than than the newest one.

There's a list of caveats as long as your arm, starting with the impossibility of reliably doing this with multiple tasks exporting metrics, but in theory it should be possible to use timestamps to simulate fast-forwarding through historical data, which would cover my specific scenario.

I've never tried it, though.

brian-brazil commented 8 years ago

Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.

delgod commented 7 years ago

@brian-brazil thanks!

Currently it is need ask user to run systat every minute, after that ask him to dump sar results to file and send it. after that analyze results manually or via kSar tool. if Prometheus realize importing it will be very-very useful!

brian-brazil commented 7 years ago

That's not something we will support. When we said bulk we mean bulk.

I'd recommend you look at the node exporter for your needs, it'll produce better stats than sar.

svetasmirnova commented 7 years ago

+1

ssouris commented 7 years ago

+1

begakens commented 7 years ago

This would be excellent for my use case. I assumed it was already possible by adding a custom time stamp as outlined in the 'Exposition Formats' page but I've since realized it doesn't work as expected. I've had to move away from Prometheus for my current project because of this but would be very interested in returning to use it in the future if this feature was implemented.

radiophysicist commented 7 years ago

+for loading data based on server logs

brian-brazil commented 7 years ago

For logs look at mtail or the grok exporter. This is not suitable for logs.

radiophysicist commented 7 years ago

I tried grok and gave it up due to its impossible to use actual timestamps from log data

jstsch commented 7 years ago

+1, this would make Prometheus usable for more than just real-time server metrics/alerting. For instance, metrics from sensor networks might come in delayed, due to network availability. Via push. Also, there already is historical data that is valuable to import.

wrouesnel commented 7 years ago

I hacked up a tool which can do something like this as a proof of concept - you can pre-initialize a Prometheus data store by streaming timestamped text-exposition format metrics into it: https://github.com/wrouesnel/prometheus-prefiller

It basically just launches a prometheus storage engine as a library to do it.

EDIT: Taken to an end state, you'd imagine some sort of /api/v1/export endpoint which simply iterates from the dawn of time at a background priority until it syncs up to the ingressing metrics, and a "bootstrap" mode in Prometheus which takes a URL and calls that endpoint to prefill itself before "launching".

szocske42 commented 7 years ago

I am looking for this feature to load synthetic test data (a.k.a. random garbage) for evaluation, prototyping, and (hopefully soon) development. I'll try the "prefiller" tool, it looks like it does what I need.

brian-brazil commented 7 years ago

Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.

With the new storage in Prometheus 2.0, this would not be the approach to take. I presume we'll do something more block based.

thypon commented 7 years ago

I created a little tool https://github.com/Cleafy/promqueen slightly based on @wrouesnel https://github.com/wrouesnel/prometheus-prefiller in order to record and backfill offline stored metrics on a newly created database.

promrec creates timestamped metrics on file. promplay generates a new Prometheus database based on these metrics files.

wrouesnel commented 6 years ago

@thypon - nice to see my 1 night hack become something a bit more polished (it also contains some pretty egrerious misunderstandings of the metrics engine I now realize too :)

geraybos commented 6 years ago

+1

Matty9191 commented 6 years ago

+1

valpackett commented 6 years ago

promplay solves import, but is there any solution for export? Something like pg_dump in Postgres?

Sure, Prometheus storage was not intended to be a long term format, but not having dump-to-text and restore-from-text as standard included tools is pretty bad.

remote_read was mentioned in the migration docs — I assumed it would eagerly read the whole database from the old instance and save all the data in the new one… looks like it doesn't :(

(I want to migrate a small but long-term database from 2.0 beta 4 to 2.1 release…)

theTibi commented 6 years ago

+1

amit-handa commented 6 years ago

is there a plan for this ? Thanks.

calebtote commented 6 years ago

It's also not backfilling data, which is when there's nothing older than it in the database.

@brian-brazil -- new to Prometheus, bit confused by this statement. Are you implying there's a way to backfill already, or that there's not and this request isn't going to resolve that?

The scenario I have is wanting to use Prometheus to monitor data from now->forward, but also would like a way to backfill the data prior to now so I don't lose my historical records. Without a way to specify metric timestamps I'm unsure of how to go about this. Not having historical metrics though seems like a deal breaker for anyone wanting to transition from an existing monitoring system to Prometheus.

brian-brazil commented 6 years ago

There is currently not, and this request is about that feature.

omerlin commented 6 years ago

Any new on this feature ? A good example of usage : data analysis I have done a bad histogram split of my data with the statsd_exporter ... so i would like to re-import my past raw data changing the format ... The only way i have is to start an influxdb, push the data and define a dedicated connection to these data ... quite heavy process. Would be so simple to re-generate the good metrics data using a Prometheus API :-)

parserpro commented 6 years ago

One more use case - I have to setup Grafana on my local computer and I need to imitate real data for this, but real data comes once per day... So, it will take a lot of time to waiting for this. I want to load sample data into Prometheus, then setup Grafana dashboard, export it and save into Grafana config file. And then use it in production environment.

hanbaga commented 6 years ago

+1

freddy4711 commented 6 years ago

+1

codesome commented 6 years ago

I would like to work on this. This will be done after https://github.com/prometheus/tsdb/issues/90 and https://github.com/prometheus/tsdb/issues/24 are addressed, and I am on them now.

logemann commented 6 years ago

@parserpro thats nearly the same use case i have. We are thinking about bundling Prometheus/Grafana into our docker-compose product stack. For selling and demo reasons, its quite necessary to have demo data before any real data enters the system, which will be never the case on a demo notebook of a sales rep.

narciero commented 6 years ago

whats a realistic ETA for this feature to make it into a release?

codesome commented 6 years ago

@narciero This PR https://github.com/prometheus/tsdb/pull/370 is required to be merged for bulk import. But as this is not a small change in TSDB which has potential of breaking things, it would take some time to verify and test and iterate on possible improvements.

You can safely assume that it will be at least 1-2 months (including the time for deciding on design of bulk import).

narciero commented 6 years ago

understood, thanks for the update!

krasi-georgiev commented 6 years ago

@codesome why do you need prometheus/tsdb#370 for the bulk import? Quickly reading trough the comments this relates to bulk imports to a new Prometheus server without any data in it so it will import data in order(ordered timestamps and nothing in the past) which should be possible even with the current tsdb package.

codesome commented 6 years ago

If we are allowing bulk import, I think we need to support for every case and not only for empty storage. We need https://github.com/prometheus/tsdb/pull/370 to allow import of any time range.

But yes, valid point. We can do bulk import even with current tsdb packages, but we need to implement that part in prometheus/prometheus. I would like to do it after the above mentioned PR so that import is seamless.

guvenim commented 5 years ago

Hi All,

I am new to Prometheus. I have read @brian-brazil's article on safari and I thought this post might be a good place to ask my question.

I have some sensor data with timestamps and other features (location etc) and I would like to insert these data to Prometheus using Python API, then connect with Grafana to visualize. It might be overshooting, but since I already have Prometheus as a Docker container, I thought I can use it as a DB to store the data. Can I do it? or do you advise to set up another DB to store the data then connect with Grafana?

I saw @thypon answer but, unfortunately, I don't know Go.

Sincerely

Guven

krasi-georgiev commented 5 years ago

We use github for bug reports and feature requests so I suggest you move this to our user mailing list.

If you haven't looked already you might find your answer in the official docs and examples or by searching in the users or devs groups. The #prometheus IRC channel is also a great place to mix up with the community and ask questions (don't be afraid to answers few while waiting).

jomach commented 5 years ago

Any news on this ? We want to migrate data from opentsdb into prometheus and it would be nice to have a way to import old data

krasi-georgiev commented 5 years ago

we are actively working on prometheus/tsdb#370 and once implemented in Prometheus you could take blocks from another Prometheus server just drop them in the data folder and it will all be handled when querying and blocks will be merged at the next automated compaction trigger.

No strict ETA, but it looks like we might be able to add this to 2.8 which should be in about a month.

calebtote commented 5 years ago

@krasi-georgiev Just to be clear on your comment, is the expectation that you have to import from another Prometheus instance (but we still couldn't script our own bulk imports with epoch:data from other tsdbs)?

codesome commented 5 years ago

Yes, after https://github.com/prometheus/tsdb/pull/370 is merged, I will be jumping directly into implementing bulk import.

@calebtote No, I think bulk import would support importing from non-Prometheus source too. The design has not been decided yet.

krasi-georgiev commented 5 years ago

aaah yeah I missed that part about the opentsdb. Don't think we have done anything in this direction yet.