Closed juliusv closed 3 years ago
Just as a random data point, I have several (I think valid) use cases for bulk imports.
+1
This would be very useful for me as well. I understand that this could be used to blur the line between a true time-series event store and prometheus' communicated focus as a representation of recent monitoring state.
Its beneficial In the case where prometheus is the ingestion point for a fan out system involving influxdb or federated rollup prometheus nodes - this would allow me to just simply keep pumping all data through the prometheus entry points without having to have two input paths in the case where the data feed is delayed.
@foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.
I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.
Thanks @brian-brazil - this is pretty much an expected response :-) Sounds like there is too much to change to make all the pieces work with historical data. Alert manager, rollup etc etc.
Should this feature request be closed then if working with historical data is too difficult?
On 15 February 2016 at 10:02, Brian Brazil notifications@github.com wrote:
@foic https://github.com/foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.
I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.
— Reply to this email directly or view it on GitHub https://github.com/prometheus/prometheus/issues/535#issuecomment-183998667 .
@foic What you're requesting is different to what this feature request is mainly about.
There's some value to bulk import, even in a world where 'storage' isn't the intended purpose of Prometheus. For example...
Recently I've been working on a Prometheus configuration for a certain forum. Although some of the metrics are from PHP, most of the really useful ones are being exported by an nginx logtailer I wrote.
In order to quickly iterate on possible metrics, prior to putting it in production--that hasn't happened yet--I added code to the logtailer that lets it read logs with a time-offset, pausing between each record until it's "supposed" to happen. That's okay-ish, but it'd be much nicer if I could bulk import an entire day's worth of logs at once without actually waiting a day. Then I could look at the result, clear the DB, and try again.
There's the timestamp hack, but none of the client libraries support timestamps, and it's ugly anyway. I haven't tried to use it.
@Baughn what do you mean by "timestamp hack"?
I have use for a bulk import endpoint as well, and that's for back-filling data that was interrupted/unavailable on the normal time flow.
Overall, I feel it might be somewhat on the border of what is the intended model of prometheus, but there will always be people with the need to diverge from the ideal setup or situation.
that's for back-filling data that was interrupted/unavailable on the normal time flow.
That's also not what this issue is about. This issue covers brand new data, with nothing newer than it in the database. It's also not backfilling data, which is when there's nothing older than it in the database.
We've never even discussed this variant.
@Baughn what do you mean by "timestamp hack"?
The /metrics format allows specifying a timestamp in addition to the values. None of the clients support this, and Prometheus doesn't support adding values that are any older than than the newest one.
There's a list of caveats as long as your arm, starting with the impossibility of reliably doing this with multiple tasks exporting metrics, but in theory it should be possible to use timestamps to simulate fast-forwarding through historical data, which would cover my specific scenario.
I've never tried it, though.
Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.
@brian-brazil thanks!
Currently it is need ask user to run systat every minute, after that ask him to dump sar results to file and send it. after that analyze results manually or via kSar tool. if Prometheus realize importing it will be very-very useful!
That's not something we will support. When we said bulk we mean bulk.
I'd recommend you look at the node exporter for your needs, it'll produce better stats than sar.
+1
+1
This would be excellent for my use case. I assumed it was already possible by adding a custom time stamp as outlined in the 'Exposition Formats' page but I've since realized it doesn't work as expected. I've had to move away from Prometheus for my current project because of this but would be very interested in returning to use it in the future if this feature was implemented.
+for loading data based on server logs
For logs look at mtail or the grok exporter. This is not suitable for logs.
I tried grok and gave it up due to its impossible to use actual timestamps from log data
+1, this would make Prometheus usable for more than just real-time server metrics/alerting. For instance, metrics from sensor networks might come in delayed, due to network availability. Via push. Also, there already is historical data that is valuable to import.
I hacked up a tool which can do something like this as a proof of concept - you can pre-initialize a Prometheus data store by streaming timestamped text-exposition format metrics into it: https://github.com/wrouesnel/prometheus-prefiller
It basically just launches a prometheus storage engine as a library to do it.
EDIT: Taken to an end state, you'd imagine some sort of /api/v1/export
endpoint which simply iterates from the dawn of time at a background priority until it syncs up to the ingressing metrics, and a "bootstrap" mode in Prometheus which takes a URL and calls that endpoint to prefill itself before "launching".
I am looking for this feature to load synthetic test data (a.k.a. random garbage) for evaluation, prototyping, and (hopefully soon) development. I'll try the "prefiller" tool, it looks like it does what I need.
Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.
With the new storage in Prometheus 2.0, this would not be the approach to take. I presume we'll do something more block based.
I created a little tool https://github.com/Cleafy/promqueen slightly based on @wrouesnel https://github.com/wrouesnel/prometheus-prefiller in order to record and backfill offline stored metrics on a newly created database.
promrec
creates timestamped metrics on file. promplay
generates a new Prometheus database based on these metrics files.
@thypon - nice to see my 1 night hack become something a bit more polished (it also contains some pretty egrerious misunderstandings of the metrics engine I now realize too :)
+1
+1
promplay
solves import, but is there any solution for export? Something like pg_dump
in Postgres?
Sure, Prometheus storage was not intended to be a long term format, but not having dump-to-text and restore-from-text as standard included tools is pretty bad.
remote_read
was mentioned in the migration docs — I assumed it would eagerly read the whole database from the old instance and save all the data in the new one… looks like it doesn't :(
(I want to migrate a small but long-term database from 2.0 beta 4 to 2.1 release…)
+1
is there a plan for this ? Thanks.
It's also not backfilling data, which is when there's nothing older than it in the database.
@brian-brazil -- new to Prometheus, bit confused by this statement. Are you implying there's a way to backfill already, or that there's not and this request isn't going to resolve that?
The scenario I have is wanting to use Prometheus to monitor data from now->forward
, but also would like a way to backfill the data prior to now
so I don't lose my historical records. Without a way to specify metric timestamps I'm unsure of how to go about this. Not having historical metrics though seems like a deal breaker for anyone wanting to transition from an existing monitoring system to Prometheus.
There is currently not, and this request is about that feature.
Any new on this feature ? A good example of usage : data analysis I have done a bad histogram split of my data with the statsd_exporter ... so i would like to re-import my past raw data changing the format ... The only way i have is to start an influxdb, push the data and define a dedicated connection to these data ... quite heavy process. Would be so simple to re-generate the good metrics data using a Prometheus API :-)
One more use case - I have to setup Grafana on my local computer and I need to imitate real data for this, but real data comes once per day... So, it will take a lot of time to waiting for this. I want to load sample data into Prometheus, then setup Grafana dashboard, export it and save into Grafana config file. And then use it in production environment.
+1
+1
I would like to work on this. This will be done after https://github.com/prometheus/tsdb/issues/90 and https://github.com/prometheus/tsdb/issues/24 are addressed, and I am on them now.
@parserpro thats nearly the same use case i have. We are thinking about bundling Prometheus/Grafana into our docker-compose product stack. For selling and demo reasons, its quite necessary to have demo data before any real data enters the system, which will be never the case on a demo notebook of a sales rep.
whats a realistic ETA for this feature to make it into a release?
@narciero This PR https://github.com/prometheus/tsdb/pull/370 is required to be merged for bulk import. But as this is not a small change in TSDB which has potential of breaking things, it would take some time to verify and test and iterate on possible improvements.
You can safely assume that it will be at least 1-2 months (including the time for deciding on design of bulk import).
understood, thanks for the update!
@codesome why do you need prometheus/tsdb#370 for the bulk import? Quickly reading trough the comments this relates to bulk imports to a new Prometheus server without any data in it so it will import data in order(ordered timestamps and nothing in the past) which should be possible even with the current tsdb package.
If we are allowing bulk import, I think we need to support for every case and not only for empty storage. We need https://github.com/prometheus/tsdb/pull/370 to allow import of any time range.
But yes, valid point. We can do bulk import even with current tsdb packages, but we need to implement that part in prometheus/prometheus. I would like to do it after the above mentioned PR so that import is seamless.
Hi All,
I am new to Prometheus. I have read @brian-brazil's article on safari and I thought this post might be a good place to ask my question.
I have some sensor data with timestamps and other features (location etc) and I would like to insert these data to Prometheus using Python API, then connect with Grafana to visualize. It might be overshooting, but since I already have Prometheus as a Docker container, I thought I can use it as a DB to store the data. Can I do it? or do you advise to set up another DB to store the data then connect with Grafana?
I saw @thypon answer but, unfortunately, I don't know Go.
Sincerely
Guven
We use github for bug reports and feature requests so I suggest you move this to our user mailing list.
If you haven't looked already you might find your answer in the official docs and examples or by searching in the users or devs groups. The #prometheus IRC channel is also a great place to mix up with the community and ask questions (don't be afraid to answers few while waiting).
Any news on this ? We want to migrate data from opentsdb into prometheus and it would be nice to have a way to import old data
we are actively working on prometheus/tsdb#370 and once implemented in Prometheus you could take blocks from another Prometheus server just drop them in the data folder and it will all be handled when querying and blocks will be merged at the next automated compaction trigger.
No strict ETA, but it looks like we might be able to add this to 2.8 which should be in about a month.
@krasi-georgiev Just to be clear on your comment, is the expectation that you have to import from another Prometheus instance (but we still couldn't script our own bulk imports with epoch:data from other tsdbs)?
Yes, after https://github.com/prometheus/tsdb/pull/370 is merged, I will be jumping directly into implementing bulk import.
@calebtote No, I think bulk import would support importing from non-Prometheus source too. The design has not been decided yet.
aaah yeah I missed that part about the opentsdb. Don't think we have done anything in this direction yet.
Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on https://github.com/prometheus/prometheus/issues/481.
EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.