influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.59k stars 3.54k forks source link

data import is extremely slow #16555

Open jacobreid opened 4 years ago

jacobreid commented 4 years ago

No matter what I do, I can't get it to go above 5000 lines per second. Seems to be a hard limit somewhere. I have a large ~(2.5TB) dataset already in tsm files from an older version of influxdb that I am trying to export to influxdb line protocol so I can load it into a new database via influx -import. I have spent a lot of time trying to troubleshoot the performance of the target influxDB, but it seems that the bottleneck is with the data import itself.

For example, when running the export to count the lines output, I get 100k+ lines/second:

influx_inspect export -datadir=/mnt/influxdb/data/ -waldir /mnt/influxdb/wal/ -out /dev/stdout | pv --line-mode --rate >/dev/null [115k/s]

When importing data in, it is capped at 5k/sec: influx_inspect export -datadir=/mnt/influxdb/data/ -waldir /mnt/influxdb/data/ -out ./data <snip time range to limit the size of the data> && cat ./data | pv --line-mode --rate | influx -import -path /dev/stdin [ 5k/s]

At 5000/sec it is going to take weeks to import the data. Is there a faster way?

jacobreid commented 4 years ago
[meta]
  dir = "/var/lib/influxdb/meta"
[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  wal-fsync-delay = "1s"
  index-version = "tsi1"
  cache-max-memory-size = "1g"
  cache-snapshot-memory-size = "1g"
  max-index-log-file-size = "1g"
  series-id-set-cache-size = 100
[http]
  enabled = true
  auth-enabled = false
[[udp]]
  enabled = true