influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.57k stars 3.53k forks source link

Add ability to do historical backfill #24752

Open pauldix opened 6 months ago

pauldix commented 6 months ago

As mentioned in #24745 the server will now only allow writing to 100 open segments at any given time. This was done to simplify the design of how ingested data is segmented as written up in #24706. Depending on the segment duration this could be only 100 minutes to 800 hours back in time. We'll need to support historical backfill.

We have a few options for how to do this. One would be through a new API endpoint that decides how to write data into the existing segment range. Or we could have the regular write API direct any overflow data (i.e. data with timestamps outside the currently 100 open segments) into some special segment.

In either case, this additional data would end up having to be combined at query time with any segments it overlaps with, which could be a big performance hit. This likely won't be solved without the compactor, which is outside the scope of this open source project.

This issue is open for tracking/discussion. We may ultimately decide that streaming historical backfill isn't in the scope of this project.

wj-stack commented 1 month ago

However, in production, backfilling historical data is very necessary, and I hope there can be a solution.