influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.71k stars 3.54k forks source link

(1.7.8) Massive RAM exhaustion when using SELECT INTO in restore process #15560

Open voiprodrigo opened 4 years ago

voiprodrigo commented 4 years ago

Hi,

Using version 1.7.8 on CentOS 7, systems with 256GB RAM.

On instance A I have performed a portable shards backup on instance A, for multiple databases. Shard duration is one day, a single shard was backed up. The backup for one database is around 1GB. On instance B, I performed restore using -newdb setting. Then started using SELECT INTO to restore missing points. For small databases this works fine. The 1GB shard backup restored into a 3.2GB database. The destination database has quite some data accumulated, and all shards in total are occupying ~800GB on the filesystem. Daily shard's are between 3 and 5 GB.

When I do the SELECT INTO, InfluxDB climbs from ~64GB up to the server's 256GB, then OOM happens.

This happens even when trying to SELECT only 1 hour worth of points, as such: SELECT INTO telegraf..:MEASUREMENT FROM /./ WHERE time > '2019-10-22T00:00:00Z' AND time < '2019-10-22T01:00:00Z' GROUP BY *

Appreciate some help in understanding how this happens this way (bug, design, something I should be doing differently?)

Regards.

timhallinflux commented 4 years ago

You'll need to further narrow the SELECT INTO down further. Essentially, the database attempts to pre-allocate memory to move things around. The amount of data being selected must be large...because you are grabbing it for every measurement.

voiprodrigo commented 4 years ago

It is a bit of data indeed. Just not... almost 200GB big. This disproportion is what led me to open the issue. I’ll try to do this measurement by measurement and see what happens.

timhallinflux commented 4 years ago

@voiprodrigo any update?