[0.9.3] WAL gets progressively slower as DB size increases

dswarbrick commented 9 years ago

With InfluxDB version 0.9.3 (actually a nightly, 0.9.4-nightly-cf58c38, from the day after 0.9.3 release), the WAL seems to get progressively slower as the DB grows. Running the WAL on a tmpfs, it's responsive immediately after the DB is nuked, but things start to go south when the DB hits about 11 GB.

WAL flushing becomes irregular, compaction of partitions sometimes takes over an hour, where previously it was taking seconds, the collectd plugin complains of write timeouts, memory usage skyrockets (swapping, and sometimes leading to OOM kills), and the WAL directory size increases beyond the approximate 5 x wal-partition-size-threshold.

Sometimes InfluxDB recovers from this tailspin, two out of three times it will simply continue to fill the WAL tmpfs, then eat all available memory, and eventually be OOM-killed.

Some log excerpts:

[wal] 2015/08/28 17:57:41 compaction of partition 2 took 25.284514662s
[wal] 2015/08/28 17:57:41 Flush due to memory. Flushing 34 series with 66113 bytes from partition 2. Compacting 50378 series
[wal] 2015/08/28 17:57:42 compaction of partition 1 took 34.370409564s
[wal] 2015/08/28 17:57:42 Flush due to memory. Flushing 289 series with 574209 bytes from partition 1. Compacting 49902 series
[wal] 2015/08/28 17:58:36 compaction of partition 3 took 1m2.905366371s
[wal] 2015/08/28 17:58:37 Flush due to memory. Flushing 3 series with 10030 bytes from partition 3. Compacting 50065 series
[collectd] 2015/08/28 17:59:15 failed to write batch: timeout
[wal] 2015/08/28 17:59:20 compaction of partition 5 took 1m42.326890362s
[wal] 2015/08/28 17:59:21 Flush due to memory. Flushing 5 series with 22525 bytes from partition 5. Compacting 50216 series
[collectd] 2015/08/28 18:00:04 failed to write batch: timeout
[collectd] 2015/08/28 18:00:13 failed to write batch: timeout
[collectd] 2015/08/28 18:00:20 failed to write batch: timeout
[wal] 2015/08/28 18:00:39 compaction of partition 2 took 2m57.288730538s
[wal] 2015/08/28 18:00:40 compaction of partition 4 took 3m1.0396585s
[collectd] 2015/08/28 18:00:40 failed to write batch: timeout
[wal] 2015/08/28 18:00:44 Flush due to memory. Flushing 2 series with 9282 bytes from partition 4. Compacting 50296 series
[wal] 2015/08/28 18:00:45 Flush due to memory. Flushing 6 series with 30651 bytes from partition 2. Compacting 50406 series
[collectd] 2015/08/28 18:01:03 failed to write batch: timeout
[collectd] 2015/08/28 18:01:33 failed to write batch: timeout
[collectd] 2015/08/28 18:01:45 failed to write batch: timeout
[collectd] 2015/08/28 18:02:35 failed to write batch: timeout
...
[collectd] 2015/08/28 18:05:47 failed to write batch: timeout
[collectd] 2015/08/28 18:05:53 failed to write batch: timeout
[wal] 2015/08/28 18:05:57 compaction of partition 3 took 7m20.064047511s
[collectd] 2015/08/28 18:05:58 failed to write batch: timeout
[collectd] 2015/08/28 18:06:03 failed to write batch: timeout
[retention] 2015/08/28 18:06:07 retention policy shard deletion check commencing
[retention] 2015/08/28 18:06:07 retention policy enforcement check commencing
[collectd] 2015/08/28 18:06:08 failed to write batch: timeout
...
[collectd] 2015/08/28 18:17:57 failed to write batch: timeout
[collectd] 2015/08/28 18:18:27 failed to write batch: timeout
[wal] 2015/08/28 18:19:03 compaction of partition 1 took 1m33.539808412s
[wal] 2015/08/28 18:19:04 Flush due to memory. Flushing 5 series with 38896 bytes from partition 1. Compacting 50206 series
[wal] 2015/08/28 18:20:39 compaction of partition 2 took 3m7.511796965s
[wal] 2015/08/28 18:20:40 Flush due to memory. Flushing 6 series with 66113 bytes from partition 2. Compacting 50428 series
[retention] 2015/08/28 18:26:07 retention policy shard deletion check commencing
[retention] 2015/08/28 18:26:07 retention policy enforcement check commencing
[retention] 2015/08/28 18:36:07 retention policy enforcement check commencing
[retention] 2015/08/28 18:36:07 retention policy shard deletion check commencing
[retention] 2015/08/28 18:46:07 retention policy enforcement check commencing
[retention] 2015/08/28 18:46:07 retention policy shard deletion check commencing
[retention] 2015/08/28 18:56:07 retention policy enforcement check commencing
[retention] 2015/08/28 18:56:07 retention policy shard deletion check commencing
[retention] 2015/08/28 19:06:07 retention policy enforcement check commencing
[retention] 2015/08/28 19:06:07 retention policy shard deletion check commencing
[retention] 2015/08/28 19:16:07 retention policy shard deletion check commencing
[retention] 2015/08/28 19:16:07 retention policy enforcement check commencing
[retention] 2015/08/28 19:26:07 retention policy enforcement check commencing
[retention] 2015/08/28 19:26:07 retention policy shard deletion check commencing
[wal] 2015/08/28 19:31:06 compaction of partition 5 took 1h13m31.963991657s
[wal] 2015/08/28 19:31:06 Flush due to memory. Flushing 5 series with 835907 bytes from partition 5. Compacting 50256 series

dswarbrick commented 9 years ago

This is running in a VM with four cores @ 2.8 GHz, 16 GB RAM and SAN storage capped at 2500 IOPS (hence the WAL on tmpfs).

iostat shows the DB storage device is very quiet when the DB is small, and the WAL seems to be doing a good job of batching the writes into periodic, large sequential writes.

InfluxDB config is all default, out of the box config, with the exception of collectd batch writer config

batch-size = 5000
batch-timeout = "10s"

pauldix commented 9 years ago

Can you lower the the wal-ready-series-size to 1024? See here: https://github.com/influxdb/influxdb/blob/master/etc/config.sample.toml#L52

You can also cut the wal-partition-size-threshold in half.

Do you know how many unique series you have in your set?

dswarbrick commented 9 years ago

Ok, set wal-ready-series-size = 1024 and wal-partition-size-threshold = 10485760.

I've actually been looking for how to show the count of unique series, but did not find this in the docs. Any hints greatly appreciated.

beckettsean commented 9 years ago

@dswarbrick it's not elegant but you can use the CLI plus some shell to get the series count:

influx -execute 'show series' -database 'mydb' | grep _key | wc -l

See https://influxdb.com/docs/v0.9/tools/shell.html for more on the CLI.

dswarbrick commented 9 years ago

Wouldn't show measurements be a more efficient way of getting that?

$ influx -execute 'show series' -database 'collectd' | grep _key | wc -l
8

or

$ influx -execute 'show measurements' -database 'collectd'
name: measurements
------------------
name
interface_rx
interface_tx
iolatency_read
iolatency_value
iolatency_write
load_longterm
load_midterm
load_shortterm

Maybe this would be more useful to know:

$ influx -execute 'show series' -database 'collectd' | grep host | wc -l
251396

So for those 8 collectd plugin types, there are quite a few plugin instances, types, and type instances.

beckettsean commented 9 years ago

@dswarbrick In InfluxDB, a measurement is a logical container of related series, and contains potentially millions of series. For performance considerations, the number of series is what matters. If you have no tags at all, then the number of measurements equals the number of series. Otherwise, you have many series per measurement (one series for each unique tag set.)

As your numbers show, you have 8 measurements, but 251396 series. The WAL flushes based on series behavior, so that's the interesting number for evaluating this issue.

beckettsean commented 9 years ago

@dswarbrick If you don't have the host tag key on every single point, then the influx -execute 'show series' -database 'collectd' | grep host | wc -l command could be missing some series. Is the number the same as if you ran influx -execute 'show series' -database 'collectd' | grep _key | wc -l?

dswarbrick commented 9 years ago

@beckettsean The host tag is present on every point, since these are received by the collectd plugin, and that data is always included. There is nothing else feeding into this same database, and no other databases on this server.

Those numbers again:

$ influx -execute 'show series' -database 'collectd' | grep host | wc -l
251396
$ influx -execute 'show series' -database 'collectd' | grep _key | wc -l
8

dswarbrick commented 9 years ago

@pauldix With the settings you recommended, it ran for several hours before again slowing down, falling behind, and eventually filling the 2 GB tmpfs that the WAL directory points to. This seems to pretty consistently happen when the database size reaches 11 GB.

[retention] 2015/08/29 05:48:15 retention policy enforcement check commencing
[retention] 2015/08/29 05:48:15 retention policy shard deletion check commencing
[wal] 2015/08/29 05:53:43 compaction of partition 5 took 52m48.93020078s
[wal] 2015/08/29 05:53:43 Flush due to memory. Flushing 49861 series with 128238055 bytes from partition 5. Compacting 234 series
[wal] 2015/08/29 05:53:47 compaction of partition 4 took 47m50.318560195s
[wal] 2015/08/29 05:53:48 Flush due to memory. Flushing 49976 series with 128735475 bytes from partition 4. Compacting 371 series
[wal] 2015/08/29 05:53:51 compaction of partition 3 took 52m49.102693155s
[wal] 2015/08/29 05:53:52 Flush due to memory. Flushing 49790 series with 154260176 bytes from partition 3. Compacting 328 series
[wal] 2015/08/29 05:54:04 Metadata flush took 45m47.598172728s
[wal] 2015/08/29 05:54:04 Flushing 0 measurements and 105 series to index
[retention] 2015/08/29 05:58:15 retention policy enforcement check commencing
[retention] 2015/08/29 05:58:15 retention policy shard deletion check commencing
[wal] 2015/08/29 06:07:52 compaction of partition 2 took 40m33.427834634s
[wal] 2015/08/29 06:07:53 Flush due to memory. Flushing 50108 series with 97255946 bytes from partition 2. Compacting 280 series
[retention] 2015/08/29 06:08:15 retention policy enforcement check commencing
[retention] 2015/08/29 06:08:15 retention policy shard deletion check commencing
[retention] 2015/08/29 06:18:15 retention policy enforcement check commencing
[retention] 2015/08/29 06:18:15 retention policy shard deletion check commencing
[retention] 2015/08/29 06:28:15 retention policy shard deletion check commencing
[retention] 2015/08/29 06:28:15 retention policy enforcement check commencing
[retention] 2015/08/29 06:38:15 retention policy enforcement check commencing
[retention] 2015/08/29 06:38:15 retention policy shard deletion check commencing
[retention] 2015/08/29 06:48:15 retention policy shard deletion check commencing
[retention] 2015/08/29 06:48:15 retention policy enforcement check commencing
[write] 2015/08/29 06:53:53 write failed for shard 1 on node 1: engine: write points: write /var/opt/influxdb/wal/collectd/default/1/02.001537.wal: no space left on device
[collectd] 2015/08/29 06:53:53 failed to write batch: write failed: engine: write points: write /var/opt/influxdb/wal/collectd/default/1/02.001537.wal: no space left on device

pauldix commented 9 years ago

@dswarbrick We're going to update the code to start rejecting writes once memory pressure gets to high. I think that the IOPS on the Bolt backed drive just aren't enough to keep up with the ingestion rate. What's your sampling rate? Curious how many points/sec are going in.

One last thing to try is to update the wal-ready-series-size to 4096.

dswarbrick commented 9 years ago

@pauldix Increased wal-ready-series-size to 4096 as you suggest. Btw, I'm nuking the DB and WAL directory each time I change a setting, so we have no nasty remnants of previous failed attempts. I also installed a new nightly, 0.9.4-nightly-0a80d8f.

We currently have approximately 300 packets per second of collectd data arriving, and each packet contains on average about 20 points. So, we're inserting about 6000 points per second.

What exactly is the relationship between points per second inserted, and maximum sustained IOPS of the storage? I assumed that with the WAL on tmpfs, InfluxDB would be batch-writing the points in large chunks to BoltDB, and we could keep the heavy IO shuffling on the tmpfs.

pauldix commented 9 years ago

From my testing, the WAL is actually much lower on the IOPS scale than what happens when you flush the data to Bolt. The primary driver is the number of unique series you have. Much more so than the number of data points. If you were inserting 20k/sec into a single series or even dozens, it wouldn't break a sweat. If you put both on the same drive and track the IOPS, you'll see a big spike every time a partition in the WAL flushes to the index.

Just out of curiosity, do you have your collectd instances sampling at once a minute?

6k/sec doesn't seem like that much to me and I would hope that the hardware you're running on would be enough for that load.

What do the IOPS against the SAN look like over time? Particularly when the DB gets large?

dswarbrick commented 9 years ago

By default, collectd uses a sampling interval of 10s, but we have bumped that to 20s for the plugins that generate the most data (i.e., our "iolatency" plugin, which collects read/write IO latency stats for a large number of logical volumes on each host, sorted into buckets of 8, 16, 32ms etc, up to 512ms - this obviously produces a lot of data points).

When the DB is small, iostat in the InfluxDB shows periodic spikes of up to 2500 wr/sec when the WAL flushes, and these last for several seconds, before subsiding to zero. Wash, rinse, repeat.

Once the DB hits about 11 GB, we no longer see the high write IOPS spikes, and instead see up to about 100 reads per sec, with the %util column showing very high values, despite apparently very little going on. Svctm column shows up to about 8ms.

dswarbrick commented 9 years ago

@pauldix My DB is now 9.8 GB since last nuking it (about 8 hours ago), and I can see the WAL directory now starting to exceed 400 MB. Partition compaction is now taking 18m, whereas before it was no more than 1m, and usually 5 - 20s.

iostat shows constant 4KB reads on the BoltDB device (vdb), interspersed with the occasional write.

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdb               0.00     0.00   73.60    0.00     0.29     0.00     8.00     0.98   13.34   13.34    0.00  13.36  98.32

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          42.93    0.00    2.92   15.93    0.00   38.22

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdb               0.00     0.00   66.20    0.00     0.26     0.00     8.00     0.93   14.09   14.09    0.00  14.08  93.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          34.24    0.00    6.11   14.78    0.00   44.87

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdb               0.00     0.00   16.00 1203.80     0.06     4.78     8.12    74.60   59.63   16.30   60.21   0.69  84.56

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.90    0.00    2.56   21.70    0.00   66.84

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.80    0.00     0.00     0.00     8.00     0.02   22.00   22.00    0.00  20.00   1.60
vdb               0.00     0.00    0.00 1996.00     0.00     8.13     8.35   126.56   63.24    0.00   63.24   0.50  99.92

This is the pattern that seems to consistently occur when the DB hits 10+ GB, followed shortly afterwards by unbounded memory consumption, swap thrashing, and eventually an OOM kill.

dswarbrick commented 9 years ago

I have strung together two storage devices in raid-0 on the VM, with a chunk size of 4 KB, which can burst to over 6500 write IOPS, and so far things are running more or less OK with a DB of 13 GB.

However, the log is still showing the the partition compaction is slowing down, the tmpfs is growing, and so is memory utilization. In iostat, I'm seeing alternating bursts of almost constant small writes, followed by small reads from the bolt DB storage. The following shows where it transitions from writes to reads:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdb               0.00     0.00    0.00 3299.50     0.00    12.89     8.00   106.32   32.36    0.00   32.36   0.30  98.00
vdc               0.00     0.00    0.00 3249.50     0.00    12.69     8.00    87.06   26.75    0.00   26.75   0.27  88.20
md0               0.00     0.00    0.00 6524.00     0.00    25.48     8.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.58    0.00    4.49   12.71    0.76   67.46

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00   72.50    0.00     0.28     0.00     8.00     0.05    0.72    0.72    0.00   0.55   4.00
vdb               0.00   348.00  431.00 1757.50     1.68     8.22     9.27    39.91   18.52    0.45   22.95   0.28  61.40
vdc               0.00   348.00  444.50 1786.00     1.74     8.33     9.24    37.24   17.36    0.50   21.55   0.28  62.60
md0               0.00     0.00  876.00 4144.50     3.42    16.19     8.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.05    0.00    2.28   10.74    0.42   71.51

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00   54.50    0.00     0.21     0.00     8.00     0.12    2.20    2.20    0.00   1.61   8.80
vdb               0.00     0.00  852.50    0.00     3.33     0.00     8.00     0.35    0.42    0.42    0.00   0.41  35.00
vdc               0.00     0.00  853.00    0.00     3.33     0.00     8.00     0.39    0.45    0.45    0.00   0.46  39.00
md0               0.00     0.00 1705.50    0.00     6.66     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00

I'm curious, and quite concerned why InfluxDB is reading / writing so inefficiently on the disk. I would have thought that a time series database would be able to write almost purely sequentially, with a file pointer moving in a forwards-only direction, perhaps with only occasional backwards seeks to update an index. From these results however, I get the impression that InfluxDB (or probably more accurately, BoltDB) is seeking all over the disk like crazy, writing 4 KB here and there. This is an absolute performance killer, and means that people are pretty much going to need fast SSDs, or perhaps even FusionIO or NVMe storage to scale this to larger DBs.

As mentioned earlier in this ticket, we are inserting only about 6000 points per second! Is there something inherently inefficient in the way that the collectd input plugin transforms collectd data into InfluxDB data points?

I expect the current test run to also fail at some point when the DB slows down to the point of the WAL falling behind again. It would be good to run a pure benchmark on a fresh DB after that, such as what the developers use for testing it. Can such a benchmark tool be included in the package?

dswarbrick commented 9 years ago

DB is now 20 GB, but this morning when I checked the log, these were the last few entries, followed by a stack trace:

[collectd] 2015/08/31 07:01:32 failed to write batch: timeout
[collectd] 2015/08/31 07:01:42 failed to write batch: timeout
[wal] 2015/08/31 07:01:47 compaction of partition 5 took 17m43.451613246s
[collectd] 2015/08/31 07:01:52 failed to write batch: timeout
[wal] 2015/08/31 07:02:01 Flush due to memory. Flushing 5 series with 149311 bytes from partition 5. Compacting 49275 series
fatal error: runtime: out of memory

No OOM kill visible in dmesg however. WAL tmpfs was using about 1.1 GB out of 2 GB at the time of the crash. RAM is 16 GB, nothing else is running on this host.

cannium commented 9 years ago

Similar issue here, some "compaction of partition" took more than 2 hours, and the current DB file sizes are:

# ll -h  /var/opt/influxdb/data/graphite/default/
total 11G
-rw-r--r-- 1 root root  10G Aug 31 15:42 3
-rw-r--r-- 1 root root 2.0G Aug 31 15:55 4

(Could someone explain the 3 and 4 for me?)

And iostat shows very high read load:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdc               0.00     0.00  110.00    0.00   880.00     0.00     8.00     1.00    8.95   9.06  99.70
sdd               0.00     0.00    0.00   10.00     0.00   720.00    72.00     0.00    0.20   0.20   0.20

where sdc is the data partition and sdd is the WAL partition.

pauldix commented 9 years ago

Are you guys running large queries as the writes are happening? That is, are you doing queries that count all the points in a series? Wondering if there's some contention against the BoltDB when it's trying to flush.

dswarbrick commented 9 years ago

I'm only occasionally firing up Grafana to check for gaps in the graph (indicating potential UDP packet receive errors). Other than that, my setup does not have any read queries at all hitting it. Also no CQ's defined.

tlopo commented 9 years ago

I am having the same problem not a single query being fired up. Trying to import a large amount of data from v.0.8.8. It imports fine for ~ 6hrs when influxdb-0.9.3 crashes, sometimes OOM.

pauldix commented 9 years ago

@tlopo and can you lower the thresholds on the wal settings that I have listed above? Also, can you set the throttling on the import to something like 1k/sec?

benbjohnson commented 9 years ago

@tlopo can you SIGQUIT (CTRL-) once influxd starts getting really slow? It feels like it's blocking on something and a stack trace might help to show what's going on.

jhorwit2 commented 9 years ago

@pauldix i'm only writing during timeouts. After a restart my large database (22GB) works for a couple writes then 500s every time. I tried adjusting some settings to lower them and raise them, but both didn't fix anything.

gerrickw commented 9 years ago

As a note, the increasing compaction also happened on my side over the weekend, which then ended up in 500 timeouts and a stack trace. Ticket with more details #3912 . I am unsure if these issues are related, but thought I'd mention I see the same compaction problem listed before the stack. At the beginning with a fresh db it was seconds and slowly grew to around an hour.

[wal] 2015/08/30 12:47:46 compaction of partition 5 took 37m22.467069689s
[wal] 2015/08/30 12:56:38 compaction of partition 4 took 36m8.441883512s
[wal] 2015/08/30 13:11:27 compaction of partition 1 took 50m49.309000495s
[wal] 2015/08/30 13:19:52 compaction of partition 3 took 45m59.070747721s
[wal] 2015/08/30 13:28:11 compaction of partition 2 took 41m0.589866679s
[wal] 2015/08/30 13:33:43 compaction of partition 5 took 45m54.36298263s
[wal] 2015/08/30 13:41:32 compaction of partition 4 took 44m53.27918068s
[wal] 2015/08/30 13:46:56 compaction of partition 1 took 35m29.168643031s
[wal] 2015/08/30 13:58:49 compaction of partition 3 took 38m57.082587597s
[wal] 2015/08/30 14:07:09 compaction of partition 2 took 38m58.128631244s
[wal] 2015/08/30 14:19:53 compaction of partition 5 took 46m9.322582282s
[wal] 2015/08/30 14:28:38 compaction of partition 4 took 47m6.254735665s
[wal] 2015/08/30 14:41:52 compaction of partition 1 took 54m54.974908204s
[wal] 2015/08/30 14:50:08 compaction of partition 3 took 51m18.504416788s
[wal] 2015/08/30 14:59:04 compaction of partition 2 took 51m54.465275712s
[wal] 2015/08/30 15:03:50 compaction of partition 5 took 43m57.248803548s
[wal] 2015/08/30 15:12:35 compaction of partition 4 took 43m56.32162067s
[wal] 2015/08/30 15:17:50 compaction of partition 1 took 35m57.630192748s
[wal] 2015/08/30 15:30:49 compaction of partition 3 took 40m39.712896895s
[wal] 2015/08/30 15:39:46 compaction of partition 2 took 40m41.203958139s
[wal] 2015/08/30 15:53:28 compaction of partition 5 took 49m37.626901884s
[wal] 2015/08/30 16:05:12 compaction of partition 4 took 52m36.439897255s
[wal] 2015/08/30 16:21:09 compaction of partition 1 took 1h3m18.723455291s

jhorwit2 commented 9 years ago

Should the WAL flush and be empty after a period of time? My WAL was full of 1 database and I stopped writing to it for over 24hrs yet the WAL never changed size.

huhongbo commented 9 years ago

I have the same problem when compaction

tlopo commented 9 years ago

@benbjohnson I didn't have the chance to send a SIGQUIT as influxdb crashed:

[wal] 2015/09/01 11:34:52 compaction of partition 3 took 472.033906ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x58 pc=0x5db656]

But it printed the stack trace for me, which can be found here.

pauldix commented 9 years ago

I just added more detailed logging during compactions to see if it's the WAL or the index that is causing the slowdown. See #3925. Will be in the nightly tonight or you can build from master. Thanks for working with us to track this down.

benbjohnson commented 9 years ago

@tlopo What settings did you use when you created your database and retention policy? From your stack trace it looks like you have about 3,000 shards.

tlopo commented 9 years ago

@benbjohnson I am using default settings for retention, but I have no more than 3 months worth data, ~ 8.5K series. Series are being updated @ 1minute interval. but not all of them are being updated, actually most of them are not.

pauldix commented 9 years ago

odd that should only be about 12 shards if you're using the default retention policy that has infinite retention. Have you queried the data? Do the timestamps look correct? I'm wondering if some very old timestamps were written in and it tried to create shards for all the other times.

On Tue, Sep 1, 2015 at 1:19 PM, tlopo notifications@github.com wrote:

@benbjohnson https://github.com/benbjohnson I am using default settings for retention, but I have no more than 3 months worth data, ~ 8.5K series. Series are being updated @ 1minute interval. but not all of them are being updated, actually most of them are not.

— Reply to this email directly or view it on GitHub https://github.com/influxdb/influxdb/issues/3885#issuecomment-136801505.

tlopo commented 9 years ago

@pauldix sorry it's actually 5 months worth data:

[root@TIAGO-TEST1 test]# node query-csv.js  'select * from "AWS.PASSTHRU.PASSTHRU02.web.hits" group by time(1m) limit 1 order desc' | sed -e '1d' |  ftable 
+--------------------------+---------------+-----------------+-------+
|           ftime          |      time     | sequence_number | value |
+--------------------------+---------------+-----------------+-------+
| 2015-09-01T18:35:00.000Z | 1441132500000 |        1        |  649  |  
+--------------------------+---------------+-----------------+-------+
[root@TIAGO-TEST1 test]# node query-csv.js  'select * from "AWS.PASSTHRU.PASSTHRU02.web.hits" group by time(1m) limit 1 order asc' | sed -e '1d' |  ftable 
+--------------------------+---------------+-----------------+-------+
|           ftime          |      time     | sequence_number | value |
+--------------------------+---------------+-----------------+-------+
| 2015-05-01T00:00:00.000Z | 1430438400000 |        1        |  450  |  
+--------------------------+---------------+-----------------+-------+

tlopo commented 9 years ago

@pauldix , I don't know if the files in the filesystem reflects the number of shards but, that's what I have on v.0.8.8:

# find /mnt/influxdb/db/shard_db_v2/ -type d 
/mnt/influxdb/db/shard_db_v2/
/mnt/influxdb/db/shard_db_v2/00001
/mnt/influxdb/db/shard_db_v2/00002
/mnt/influxdb/db/shard_db_v2/00003
/mnt/influxdb/db/shard_db_v2/00004
/mnt/influxdb/db/shard_db_v2/00005
/mnt/influxdb/db/shard_db_v2/00006
/mnt/influxdb/db/shard_db_v2/00007
/mnt/influxdb/db/shard_db_v2/00008
/mnt/influxdb/db/shard_db_v2/00009
/mnt/influxdb/db/shard_db_v2/00010
/mnt/influxdb/db/shard_db_v2/00011
/mnt/influxdb/db/shard_db_v2/00012
/mnt/influxdb/db/shard_db_v2/00013
/mnt/influxdb/db/shard_db_v2/00014
/mnt/influxdb/db/shard_db_v2/00015
/mnt/influxdb/db/shard_db_v2/00016
/mnt/influxdb/db/shard_db_v2/00017
/mnt/influxdb/db/shard_db_v2/00018

# find /mnt/influxdb/db/shard_db_v2/ -type f | wc -l 
4573

Number of series:

# node query-csv.js  'list series' | sed -e 1d  | wc -l
8689

dswarbrick commented 9 years ago

I just installed 0.9.4-nightly-14c04eb but it fails to start with:

[monitor] 2015/09/02 08:39:48 starting monitor service for cluster 0, host fra-influxdb
[monitor] 2015/09/02 08:39:48 'runtime:map[]' registered for monitoring
[monitor] 2015/09/02 08:39:48 storing in http://127.0.0.1, database '_internal', interval 1m0s
[monitor] 2015/09/02 08:39:48 ensuring database _internal exists on http://127.0.0.1
run: create server: failed to create monitoring database on http://127.0.0.1, received code: 404

I tried reverting to version 0.9.3, creating the _internal database by hand, and then upgrading again, but it still does not work. For now I've had to disable the new monitoring, which sorta defeats the purpose of running this version.

pauldix commented 9 years ago

@otoolep, can you have a look at the monitoring issue that cropped up with last night's build?

otoolep commented 9 years ago

Absolutely will look into it, that shouldn't happen.

In the meantime add store-enabled=false to the config, section named monitor, to prevent creation.

https://github.com/influxdb/influxdb/blob/master/etc/config.sample.toml#L96

That should unblock you.

On Wednesday, September 2, 2015, Paul Dix notifications@github.com wrote:

@otoolep https://github.com/otoolep, can you have a look at the monitoring issue that cropped up with last night's build?

— Reply to this email directly or view it on GitHub https://github.com/influxdb/influxdb/issues/3885#issuecomment-137045304.

otoolep commented 9 years ago

@dswarbrick -- oh, I see you actually disabled monitoring to unblock yourself. Good. There is nothing in this build monitoring-wise which will help you, so don't worry too much about disabling it.

pauldix commented 9 years ago

Created a new issue to solve this problem so I'm closing this out to consolidate things. Please track it at #4086.

influxdata / influxdb

[0.9.3] WAL gets progressively slower as DB size increases #3885