Closed scotloach closed 5 years ago
I've spent a lot more time trying to understand what's happening here. I've also downgraded back to 1.6.4 and the same thing happens - so I'm suspecting that after the upgrade, maybe one of the shards got corrupted in some way. What seems to be happening is, when InfluxDB starts up, everything is fine. At some time after this, the read I/O load suddenly gets maxed out, this doesn't ever seem to stop until InfluxDB is restarted. I'm attaching here a profile I took while it was in this state (this is 1.6.4). profiles.tar.gz
I have encountered the same problem. The read load continues to be high.
Same problem here, visible increase of memory usage after upgrade and performance of the queries dropped dramatically - the platform is borderline unusable at the moment... No other changes than upgrading version to 1.7
@scotloach @cxhuawei @nktl thanks for your reports. We will prioritise an investigation into this issue today.
In the mean time, if you have any more information on your setups that would be very helpful. For example:
SHOW SERIES CARDINALITY
.curl -o profiles.tar.gz http://localhost:8086/debug/pprof/all?cpu=true
It will take 30s for the request to complete.@scotloach thanks for your 1.6.4 profiles. When your downgraded back to 1.6.4 would you describe the system behaviour as being exactly the same as 1.7.0 (same heap usage, CPU and IO) or would you say there were differences in the performance characteristics?
I will look at your profiles this morning.
All the following comment was wrong, analyzed the profile using the wrong binary, corrected version here
Just for context, I analyzed a bit the heap profile that @scotloach provided, here's the top heap consuming calls.
Total: 88992.1 MB
33940.0 38.1% 38.1% 33991.0 38.2% github.com/influxdata/influxdb/prometheus/remote..Unmarshal
14590.3 16.4% 54.5% 14590.3 16.4% runtime.getitab
7334.0 8.2% 62.8% 9060.1 10.2% runtime..sysAlloc
4771.5 5.4% 68.1% 5255.3 5.9% runtime..Error
2462.0 2.8% 70.9% 2462.0 2.8% github.com/influxdata/influxdb/query..Close
2262.0 2.5% 73.4% 3940.5 4.4% text/tabwriter..append
1790.8 2.0% 75.5% 4249.8 4.8% github.com/influxdata/influxdb/query..Next
1662.8 1.9% 77.3% 1665.8 1.9% github.com/influxdata/influxdb/vendor/collectd.org/network.parseValues
1508.5 1.7% 79.0% 1567.5 1.8% bytes..String
1369.5 1.5% 80.6% 13209.1 14.8% type..eq.github.com/influxdata/influxdb/vendor/github.com/BurntSushi/toml.item
Attached the profile image too.
Could you please also provide some data from the output of the following:
influx_inspect dumptsm -index path/to/tsm/file/in/shard/being/compacted/0000000xy-00000000z.tsm
You don't need to provide all the output (there could be a lot), we're just interested in seeing if the min time and max time are correct on the TSM index data.
@fntlnz I'm not sure you're using the right binary. When I look at the heap profile I see all the heap being used by the decoders for a compaction:
Showing top 10 nodes out of 130
flat flat% sum% cum cum%
33.57GB 37.40% 37.40% 33.57GB 37.40% github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
19.43GB 21.65% 59.05% 27.74GB 30.91% github.com/influxdata/influxdb/tsdb/index/tsi1.(*LogFile).execSeriesEntry
7.07GB 7.88% 66.92% 7.79GB 8.68% github.com/influxdata/influxdb/tsdb/index/tsi1.(*logTagValue).addSeriesID
2.66GB 2.96% 69.89% 2.66GB 2.96% github.com/influxdata/influxdb/pkg/rhh.(*HashMap).insert
2.50GB 2.79% 72.68% 6.14GB 6.84% github.com/influxdata/influxdb/tsdb/index/inmem.(*Index).CreateSeriesListIfNotExists
2.24GB 2.49% 75.17% 2.24GB 2.49% github.com/influxdata/influxdb/models.Tags.Clone
1.92GB 2.14% 77.31% 6.01GB 6.69% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints
1.63GB 1.82% 79.13% 1.63GB 1.82% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*entry).add
1.57GB 1.75% 80.88% 3.24GB 3.61% github.com/influxdata/influxdb/pkg/rhh.(*HashMap).grow
1.30GB 1.44% 82.32% 5.52GB 6.15% github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry
This issue looks very similar to the one we recently fixed, which is somewhat confusing.
Thanks!
Regarding your questions:
What index are you using (TSI / inmem)? index-version = "tsi1"
How many shards do you have?
Around 10000
There are roughly 100 databases, the 'worst' one has carnality of around 40000
Getting 404 on this influx endpoint for some reason
@nktl thanks. Do you have pprof-enabled = true
in the http
section of your config? It defaults to true, but maybe you disabled it?
@nktl thanks. Do you have
pprof-enabled = true
in thehttp
section of your config? It defaults to true, but maybe you disabled it?
Yes, that was it! Unfortunately cannot restart influx at the moment to apply this, would have to be a bit later today or tomorrow.
Sorry for the confusion, uploading the profile output with the right binary from the downloads page.
and here's the top calls.
Dropped 549 nodes (cum <= 0.45GB)
Showing top 10 nodes out of 130
flat flat% sum% cum cum%
33.57GB 37.40% 37.40% 33.57GB 37.40% github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
19.43GB 21.65% 59.05% 27.74GB 30.91% github.com/influxdata/influxdb/tsdb/index/tsi1.(*LogFile).execSeriesEntry
7.07GB 7.88% 66.92% 7.79GB 8.68% github.com/influxdata/influxdb/tsdb/index/tsi1.(*logTagValue).addSeriesID
2.66GB 2.96% 69.89% 2.66GB 2.96% github.com/influxdata/influxdb/pkg/rhh.(*HashMap).insert
2.50GB 2.79% 72.68% 6.14GB 6.84% github.com/influxdata/influxdb/tsdb/index/inmem.(*Index).CreateSeriesListIfNotExists
2.24GB 2.49% 75.17% 2.24GB 2.49% github.com/influxdata/influxdb/models.Tags.Clone
1.92GB 2.14% 77.31% 6.01GB 6.69% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints
1.63GB 1.82% 79.13% 1.63GB 1.82% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*entry).add
1.57GB 1.75% 80.88% 3.24GB 3.61% github.com/influxdata/influxdb/pkg/rhh.(*HashMap).grow
1.30GB 1.44% 82.32% 5.52GB 6.15% github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry
@nktl ok thanks. In the meantime, if you can identify a shard that you think is being continually compacted do you think you could send it to us? We have a secure SFTP site for uploading it. Feel free to email me edd
at the product name dot com. It may help us get to the bottom of the issue more quickly.
For my system, I'm using tsi, about 6K shards, 70 databases. Largest DB has about 8M series.
I use 7 day shards for the most part so they all rolled over last evening, and it's been stable and looking good on the 1.64 release since then. I'm happy to help you out with your investigation if I can, but I'm not going to put 1.7 back on it anytime soon.
@e-dard, the following line using DecodeFloatBlock
I would expect to be for query load, as compaction should be using the new DecodeFloatArrayBlock
API
33.57GB 37.40% 37.40% 33.57GB 37.40% github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
Another thing I noticed, the current _internal.monitor shard was growing very fast when this was happening, it churned through about 250GB of disk before I noticed and deleted that shard. I'm not sure what it was and I needed to clean the space up.
That is a really great lead, @scotloach. Perhaps the behavior of the _internal
monitoring database has changed.
My instinct is more that something was happening (compaction I guess?) and got into some internal loop, this process was continually reading from disk and as a side effect was updating some state in the monitor db.
@stuartcarnie The profiles are taken from a 1.6.4
release. @scotloach downgraded when they inititially saw problems.
My current theory based on this ticket and #10465 are that there is a compaction bug somewhere, but it's not caused by a problem with encoding bad TSM data.
The symptoms seem to be when upgrading to 1.7.0 a full compaction seems to go on indefinitely. Profiles from this ticket, and #10465 indicate a significant amount of time decoding blocks and checking for overlaps, suggesting the routines are being called many time.
I'm awaiting logs from #10465 to see if there are any clues there.
To clarify, downgrading to 1.6.4 and restarting did not make the issue go away. The issue seems to have gone away now that I'm writing to new shards created by 1.6.4. It's only been about 12 hours since I restarted it post shard-rollover, so it's possible it will happen again, I'm keeping a close eye on it.
Ahh right profile is 1.6.4, my mistake @e-dard
@scotloach sorry yeah, I think the shard rolling over has something to do with too.
@scotloach You should be able to tell by checking if the most recent cold shard (the shard that was problematic) is now completely cold. An ls -lah
on that shard directory should provide information to help determine that.
I'm not actually sure what the problematic shard was.
@e-dard Hi , Regarding your questions:
index-version = "tsi1"
and profiles
I use influxdb as promethues's storage. The write iops is normal but the read iops periodic surge.
When I query 30 days of data, Influxdb has a high probability of falling into high read load so that I can only restart influxdb.
thanks. : )
I'm still running into problems with influxdb 1.6.4. I think I can identify problematic files now, if you still need some I can probably provide them.
In my case, after upgrading to influxdb 1.7, the CPU usage go crazy. I used telegraf and forward to 2 influxdb (1.6.4, and 1.7.0).
Read is done directly on 1.6.4. Mean 1.6.4 do more works, but its cpu usage is 0.04 while as the 1.7.0 constantly use full 2 cores(what I have on the system).
Any guide how I can collect some particular metrics to support debugging?
There's a bug related to compaction with 1.7; something happens with current/new shards that causes the compaction process to get stuck and max out CPU and read I/O processing them.
Here is what I did to identify when this is happening: find /var/lib/influxdb -name '*.tsm.tmp'
This will return a list of files currently being compacted by influxdb. If you see the same files here for a long time, and if the size/modtime on these files is not changing, most likely it's in this state.
Our influx instance is almost unusable now with > 40 fully occupied CPU cores at the moment and in fact it looks like there are 'stuck' compaction tasks in telegraf database, timestamps of *tsm.tmp files have not changed since the upgrade a few days ago...
@scotloach The issues seems to be track down and PR is ready https://github.com/influxdata/influxdb/pull/10479
So cross finger for a new release soon.
Some open questions:
I have 5 shards with a total of ~80 GB that is exhibiting this issue. Already downgraded to 1.6.4 after noticing the issue.
Hi everyone. Here is an update on the issue: we have a fix ready in #10479. It will very likely be merged in the coming hours. Thanks for your profiles and data. It really helped us track the (very subtle) issue down.
We introduced a bug here. The bug affects how runs of integers (timestamps are stored and compressed as integers in InfluxDB) are encoded with the simple8b
encoding scheme.
This bug found its way into the 1.7.0 release (and master
). We will be shipping 1.7.1 with a fix soon. Hoepfully today or tomorrow.
One user sent us their shard, totalling around 2.7GB of TSM data, and, with some tooling I wrote, I determined that < 0.00013%
(~ 1 in 7600) of their blocks were affected.
The bug is quite difficult to trigger. The encoder would need to be given 119 or 239 continuous 1
values followed by another value (the 120th or 240th) that is not a 1
. When this happens, unfortunately the 120th or 240th value would be encoded as a 1
and not as the true value.
We use simple8b
for two TSM data types in InfluxDB: integer field values, and timestamps. Any field values of other type (remember the default field value type is float), are unaffected. As you might imagine, it's unlikely that integer field values would follow the pattern needed to result in an incorrectly encoded 120th or 240th value in a run.
Timestamps though are more likely to trigger this bug, and the impact of them triggering this bug is a broken invariant in the compaction engine, which has what's been causing the compaction issues.
In Influx, timestamps are heavily compressed. We use a couple of encoding schemes: run length encoding and simple8b
. The first thing we do for both encoding schemes is to reduce the set of timestamps we're encoding to deltas, that is, differences between each subsequent timestamp (we keep the first timestamp in its original format as the "base"). Next, we look to see if all of those deltas are the same. This would occur if we had perfectly spaced timestamps, i.e., a series of points with equally spaced timestamps were written to the shard. In that case we can use run-length encoding, storing the base value, the delta and how many times we need to repeat it. This scenarios means the afforementioned bug is not going to be triggered.
Sometimes though, data is not perfectly spaced. For example if you use Telegraf and it slows down or there is network traffic etc. In that case you might generally get evenly spaced timestamps, but then every so often your timestamp is one millisecond or second or unit interval off from where it should be. In this case we can't use RLE, so we use simple8b
. First we look for a common divisor in the timestamp deltas. Since we store in nanoseconds, and users often send data every ms or second or 10 seconds, we often find the common divisor to the deltas is 1000, 10000, 100000 etc, and we can reduce out deltas even further to a stream of 1
s with the occational 2
or 3
where timestamps are slightly late in arriving (these are the stray timestamps).
Now, if that stray timestamp (where the delta is not 1
) happens to be the 120th or 240th timestamp in a run of timestamps, then the timestamp is unfortunately going to be incorrectly encoded. But, since we're encoding the deltas of the timestamps and not the timestamps themseleves, then the error is small, and the timestamp is encoded close to its true value. In the shard we examined, typically the timestamps were encoded 1ms away from where they were supposed to be.
If the timestamp that has been incorrectly encoded is the last timestamp in a block, then the block's entry in the TSM index (at the end of each TSM file) is going to disagree with the block about what the max-time of the block is. In the problem cases we examined in the provided shard, the TSM index would say the max time for the block is x
, and the final timestamp in the block would be x - 1ms
.
Due to this, the compaction engine would get stuck because it kept trying to read more of the block upto the max timestamp, but there was no matching timestamp data in the block. Essentially it would get to the end of the block and be unable to read a timestamp of x
.
Because the bug resulted in some incorrect timestamps being encoded into some TSM blocks, rolling back to 1.6.4 would not fix this issue.
Aside from actually ensuring the correct encoding of values using simple8b
, the fix in #10479 also detects when this bug has impacted the final timestamp in a block, and corrects the timestamp by using the TSM index entry as the source of truth. The entry was stored before the timestamp was encoded into the block, so the result is the timestamp is corrected properly.
We have tested this on the problem shard provided, and we're also testing on some internal system.
@hallvar your questions:
Will the fix presented in the PR fix shards that have already been corrupted?
The PR will fix blocks where the final timestamp value was incorrectly encoded, by replacing it with the max time from the TSM index entry. This will correct the timestamp.
It is possible that other timestamps may be incorrectly encoded if they are the 120th/240th timestamp in a run of 120/240 perfectly spaced timestamps, and the 120th/240th timestamp does not follow the perfect spacing (i.e., it is "slow" or "fast"). In that case, the erroneous timestamp would be incorrectly encoded according to the "perfect" run so to speak (it would be adjusted by the incorrect delta).
Is there anything that can be done to fix the shards that have gone into a compaction loop?
A couple of suggestions:
(1) wait for 1.7.1 - we're hopeful to release it today or tomorrow.
(2) run tomorrow's nightly. Please note, nightlies are not intended to be used on production loads.
(3) if you have enough CPU cores, reduce max-concurrent-compactions
to a low number so that you don't tie up all your cores.
(4) if you have a backup you're willing to restore from, then you can restore it and run 1.6.4 until 1.7.1 is released.
Or can we only wait for the shard duration to be reached?
The final full compaction could still trigger this.
Is the data corrupted? I.e. should I delete the shards and restore from backup?
Max timestamps can be corrupted but will be correctly restored in 1.7.1 by using the TSM index. It is possible, but we believe unlikely that other timestamps are corrupted. In the case that they are corupted, it is likely they are corrupted in a way that adjusts them slightly, e.g., by 1ms, second etc, depending on your timestamp intervals between points.
If you have a backup then certainly you can restore and run 1.6.4 until 1.7.1 is released.
Thank you for taking the time to explain the issue and answering my questions, highly appreciated! Awaiting 1.7.1 and will post the results here.
About the uneven time distribution which prevents RLE
and forces simple8b
, it's not necessarily the fault of telegraf, we also collect httpd
access log information using the logparser
plugin, each collected value (request size, response size, response time) has its unique timestamp (as logged by the service), that might explain why the second instance of this issue (which happened after I extracted some measurements in different databases) was in a shard in the logs
database.
I just installed 1.7.1
everything went back to normal within a minute after startup.
@scotloach @cxhuawei @nktl @v9n @hallvar 1.7.1 is now available. Please upgrade at your earliest convenience. I'm going to close this issue, but will re-open if there are continued problems.
@e-dard Thanks you. When will the docker image 1.7.1 available on dockerhub?
@brava-vinh we have submitted a PR for the docker hub image to be updated
Running 1.7.1 since ~Nov 16 02:00:00 UTC 2018 (from the official docker image) but still having .tsm.tmp files that seem stuck:
$ date; sudo find . -name "*.tsm.tmp"|xargs sudo ls -lh
Fri Nov 16 18:48:01 UTC 2018
-rw-r--r-- 1 root root 523M Nov 16 09:41 ./asd/downsampled/1002/000013067-000000004.tsm.tmp
-rw-r--r-- 1 root root 312M Nov 16 09:41 ./asd/full_resolution/1001/000010126-000000002.tsm.tmp
-rw-r--r-- 1 root root 0 Nov 16 09:41 ./asd/full_resolution_calculations/1005/000008481-000000003.tsm.tmp
-rw-r--r-- 1 root root 0 Nov 16 09:41 ./asd/full_resolution_calculations/1030/000000033-000000003.tsm.tmp
Exact same result at Fri Nov 16 19:18:38 UTC 2018
The shard sizes are ./asd/full_resolution/1001/
: 41 GB. ./asd/downsampled/1002/
: 24 GB. ./asd/full_resolution_calculations/1005/
: 1.6 GB. ./asd/full_resolution_calculations/1030/
: 929 MB
The container was manually restarted at 2018-11-16T09:36:13.006788691Z to set max-concurrent-compactions to 4. It has been running since then.
I don't really know if the files behave as expected or not. However InfluxDB is heavily using all 16 CPU cores.
Could it be that this is a corrupt shard from 1.7.0 that is not being fixed by 1.7.1?
My history has been:
I did a
curl -o profile6.pprof http://localhost:8086/debug/pprof/profile
and got the results:
go tool pprof -top influxd-1.7.1 profile6.pprof
File: influxd-1.7.1
Type: cpu
Time: Nov 16, 2018 at 8:27pm (CET)
Duration: 30.17s, Total samples = 96.34s (319.31%)
Showing nodes accounting for 87.61s, 90.94% of 96.34s total
Dropped 240 nodes (cum <= 0.48s)
flat flat% sum% cum cum%
11.01s 11.43% 11.43% 11.01s 11.43% strconv.ParseUint
5.65s 5.86% 17.29% 18.84s 19.56% github.com/influxdata/influxdb/tsdb/engine/tsm1.FloatArrayDecodeAll
5.42s 5.63% 22.92% 5.42s 5.63% runtime.memclrNoHeapPointers
4.96s 5.15% 28.07% 6.18s 6.41% path/filepath.Base
4.59s 4.76% 32.83% 9.75s 10.12% runtime.scanobject
3.61s 3.75% 36.58% 26.44s 27.44% github.com/influxdata/influxdb/tsdb/engine/tsm1.DefaultParseFileName
3.58s 3.72% 40.29% 4.05s 4.20% runtime.(*mheap).freeSpanLocked
3.10s 3.22% 43.51% 13.11s 13.61% runtime.sweepone
2.85s 2.96% 46.47% 2.85s 2.96% indexbytebody
2.84s 2.95% 49.42% 3.18s 3.30% runtime.findObject
2.83s 2.94% 52.36% 2.83s 2.94% runtime.futex
2.36s 2.45% 54.81% 5.64s 5.85% strings.Index
2.12s 2.20% 57.01% 13.34s 13.85% github.com/influxdata/influxdb/tsdb/engine/tsm1.timeBatchDecodeAllSimple
1.90s 1.97% 58.98% 5.53s 5.74% runtime.lock
1.89s 1.96% 60.94% 33s 34.25% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmGeneration).level
1.79s 1.86% 62.80% 1.79s 1.86% runtime.procyield
1.71s 1.77% 64.57% 28.15s 29.22% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).ParseFileName
1.61s 1.67% 66.24% 29.76s 30.89% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).ParseFileName
1.45s 1.51% 67.75% 3.32s 3.45% runtime.unlock
1.35s 1.40% 69.15% 31.11s 32.29% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).ParseFileName-fm
1.32s 1.37% 70.52% 34.27s 35.57% github.com/influxdata/influxdb/tsdb/engine/tsm1.tsmGenerations.level
1.22s 1.27% 71.79% 1.22s 1.27% os.IsPathSeparator (inline)
1.20s 1.25% 73.03% 1.20s 1.25% runtime.(*mSpanList).remove
1.10s 1.14% 74.17% 1.10s 1.14% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*block).overlapsTimeRange (inline)
1.07s 1.11% 75.29% 9.96s 10.34% runtime.(*mspan).sweep
1.02s 1.06% 76.34% 19.04s 19.76% runtime.(*mcentral).cacheSpan
0.98s 1.02% 77.36% 3.51s 3.64% github.com/influxdata/influxdb/pkg/encoding/simple8b.DecodeBytesBigEndian
0.95s 0.99% 78.35% 0.95s 0.99% runtime.(*gcBitsArena).tryAlloc (inline)
0.93s 0.97% 79.31% 0.93s 0.97% runtime.osyield
0.90s 0.93% 80.25% 0.90s 0.93% runtime.memmove
0.88s 0.91% 81.16% 1.84s 1.91% runtime.greyobject
0.76s 0.79% 81.95% 0.76s 0.79% runtime.markBits.isMarked (inline)
0.75s 0.78% 82.73% 0.75s 0.78% github.com/influxdata/influxdb/tsdb/engine/tsm1.ZigZagDecode (inline)
0.71s 0.74% 83.46% 0.85s 0.88% runtime.(*gcSweepBuf).pop (inline)
0.70s 0.73% 84.19% 6.34s 6.58% github.com/influxdata/influxdb/tsdb/engine/tsm1.integerBatchDecodeAllSimple
0.70s 0.73% 84.92% 0.70s 0.73% runtime.(*fixalloc).alloc
0.70s 0.73% 85.64% 7.46s 7.74% runtime.(*mcentral).freeSpan
0.56s 0.58% 86.23% 0.56s 0.58% github.com/influxdata/influxdb/pkg/encoding/simple8b.unpack240
0.56s 0.58% 86.81% 29.40s 30.52% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineFloat
0.52s 0.54% 87.35% 23.16s 24.04% runtime.mallocgc
0.51s 0.53% 87.88% 13.20s 13.70% runtime.growslice
0.40s 0.42% 88.29% 5.54s 5.75% runtime.(*mheap).alloc_m
0.39s 0.4% 88.70% 2s 2.08% runtime.(*mheap).allocSpanLocked
0.37s 0.38% 89.08% 0.53s 0.55% runtime.markrootSpans
0.21s 0.22% 89.30% 0.51s 0.53% runtime.gcmarknewobject
0.17s 0.18% 89.47% 1.27s 1.32% runtime.newMarkBits
0.15s 0.16% 89.63% 0.49s 0.51% runtime.gentraceback
0.14s 0.15% 89.78% 4.90s 5.09% runtime.deductSweepCredit
0.13s 0.13% 89.91% 0.82s 0.85% runtime.heapBits.initSpan
0.12s 0.12% 90.04% 9.45s 9.81% runtime.(*mcentral).grow
0.11s 0.11% 90.15% 5.98s 6.21% runtime.(*mheap).freeSpan.func1
0.07s 0.073% 90.22% 27.55s 28.60% github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatArrayBlock
0.07s 0.073% 90.29% 19.26s 19.99% runtime.(*mcache).nextFree
0.07s 0.073% 90.37% 10.10s 10.48% runtime.gcDrain
0.06s 0.062% 90.43% 11.63s 12.07% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineInteger
0.06s 0.062% 90.49% 0.55s 0.57% runtime.schedule
0.05s 0.052% 90.54% 13.39s 13.90% github.com/influxdata/influxdb/tsdb/engine/tsm1.TimeArrayDecodeAll
0.05s 0.052% 90.60% 19.09s 19.82% runtime.(*mcache).refill
0.05s 0.052% 90.65% 12.29s 12.76% runtime.makeslice
0.05s 0.052% 90.70% 37.77s 39.20% runtime.systemstack
0.04s 0.042% 90.74% 19.13s 19.86% runtime.(*mcache).nextFree.func1
0.03s 0.031% 90.77% 6.37s 6.61% github.com/influxdata/influxdb/tsdb/engine/tsm1.IntegerArrayDecodeAll
0.03s 0.031% 90.80% 0.98s 1.02% runtime.futexsleep
0.03s 0.031% 90.83% 12.19s 12.65% runtime.gosweepone
0.02s 0.021% 90.86% 1.90s 1.97% runtime.futexwakeup
0.02s 0.021% 90.88% 12.16s 12.62% runtime.gosweepone.func1
0.01s 0.01% 90.89% 25.43s 26.40% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanLevel
0.01s 0.01% 90.90% 11.46s 11.90% github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeIntegerArrayBlock
0.01s 0.01% 90.91% 8.49s 8.81% runtime.(*mheap).alloc
0.01s 0.01% 90.92% 5.55s 5.76% runtime.(*mheap).alloc.func1
0.01s 0.01% 90.93% 5.99s 6.22% runtime.(*mheap).freeSpan
0.01s 0.01% 90.94% 0.53s 0.55% runtime.mProf_Malloc
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles
0 0% 90.94% 9.07s 9.41% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanOptimize
0 0% 90.94% 34.57s 35.88% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compact
0 0% 90.94% 21.16s 21.96% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull.func1
0 0% 90.94% 19.87s 20.62% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactHiPriorityLevel.func1
0 0% 90.94% 34.57s 35.88% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableLevelCompactions.func1
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).Apply
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).compactGroup
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).Next
0 0% 90.94% 41.03s 42.59% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).merge
0 0% 90.94% 29.40s 30.52% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeFloat
0 0% 90.94% 11.63s 12.07% github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeInteger
0 0% 90.94% 1.07s 1.11% runtime.(*mheap).reclaim
0 0% 90.94% 7.44s 7.72% runtime.bgsweep
0 0% 90.94% 10.12s 10.50% runtime.gcBgMarkWorker
0 0% 90.94% 10.09s 10.47% runtime.gcBgMarkWorker.func2
0 0% 90.94% 0.70s 0.73% runtime.goschedImpl
0 0% 90.94% 0.69s 0.72% runtime.gosched_m
0 0% 90.94% 0.62s 0.64% runtime.markroot
0 0% 90.94% 0.75s 0.78% runtime.mcall
0 0% 90.94% 0.55s 0.57% runtime.profilealloc
go tool pprof -gif influxd-1.7.1 profile6.pprof
gives me
Is there anything else I can provide here or on e-mail to help sort this issue?
I see same thing with @hallvar . I ended up just remove all db since it's new DB and I was doing double write from 1.6.4 amd it good since then.
Indeed from the log of InfluxDB I can see an entry for starting the compaction process of ./asd/full_resolution/1001/, but never an entry for it completing.
@hallvar few questions:
.tsm.tmp
files stopped growing in size, suggesting the compactor is no longer making progress?@hallvar also, do you see a log entry for a compaction for full_resolution_calculations/1005
that starts but does not stop?
Some more details about full_resolution/1001
$ ls -l
total 42339144
-rw-r--r-- 1 root root 2147486635 Nov 6 00:47 000001280-000000005.tsm
-rw-r--r-- 1 root root 2147488291 Nov 6 00:49 000001280-000000006.tsm
-rw-r--r-- 1 root root 1364751604 Nov 6 00:50 000001280-000000007.tsm
-rw-r--r-- 1 root root 2147488268 Nov 6 18:39 000002176-000000005.tsm
-rw-r--r-- 1 root root 2147488779 Nov 6 18:42 000002176-000000006.tsm
-rw-r--r-- 1 root root 537029393 Nov 6 18:43 000002176-000000007.tsm
-rw-r--r-- 1 root root 2147486794 Nov 7 17:26 000003456-000000005.tsm
-rw-r--r-- 1 root root 2147487880 Nov 7 17:27 000003456-000000006.tsm
-rw-r--r-- 1 root root 1487047140 Nov 7 17:28 000003456-000000007.tsm
-rw-r--r-- 1 root root 2147486948 Nov 8 15:41 000004736-000000005.tsm
-rw-r--r-- 1 root root 2147487245 Nov 8 15:42 000004736-000000006.tsm
-rw-r--r-- 1 root root 1514402097 Nov 8 15:43 000004736-000000007.tsm
-rw-r--r-- 1 root root 2147485943 Nov 9 14:25 000006016-000000005.tsm
-rw-r--r-- 1 root root 2147486486 Nov 9 14:26 000006016-000000006.tsm
-rw-r--r-- 1 root root 1630170600 Nov 9 14:28 000006016-000000007.tsm
-rw-r--r-- 1 root root 2147486552 Nov 10 13:48 000007296-000000005.tsm
-rw-r--r-- 1 root root 2147486659 Nov 10 13:49 000007296-000000006.tsm
-rw-r--r-- 1 root root 1552881210 Nov 10 13:51 000007296-000000007.tsm
-rw-r--r-- 1 root root 2147488256 Nov 11 10:34 000008192-000000005.tsm
-rw-r--r-- 1 root root 2147489347 Nov 11 10:37 000008192-000000006.tsm
-rw-r--r-- 1 root root 74587847 Nov 11 10:37 000008192-000000007.tsm
-rw-r--r-- 1 root root 2147489288 Nov 11 23:28 000008704-000000005.tsm
-rw-r--r-- 1 root root 463276511 Nov 11 23:29 000008704-000000006.tsm
-rw-r--r-- 1 root root 633100657 Nov 12 02:15 000008833-000000004.tsm
-rw-r--r-- 1 root root 470640594 Nov 12 06:50 000008962-000000004.tsm
-rw-r--r-- 1 root root 139974092 Nov 12 12:36 000009099-000000004.tsm
-rw-r--r-- 1 root root 156140475 Nov 13 17:45 000009233-000000004.tsm
-rw-r--r-- 1 root root 110167094 Nov 13 17:45 000009361-000000004.tsm
-rw-r--r-- 1 root root 111120955 Nov 14 02:01 000009490-000000004.tsm
-rw-r--r-- 1 root root 108211921 Nov 14 02:01 000009618-000000004.tsm
-rw-r--r-- 1 root root 118685885 Nov 16 01:22 000009748-000000004.tsm
-rw-r--r-- 1 root root 134496994 Nov 16 01:23 000009876-000000004.tsm
-rw-r--r-- 1 root root 109944892 Nov 16 01:25 000010004-000000004.tsm
-rw-r--r-- 1 root root 25862631 Nov 16 01:22 000010036-000000003.tsm
-rw-r--r-- 1 root root 26852524 Nov 16 01:22 000010068-000000003.tsm
-rw-r--r-- 1 root root 26196516 Nov 16 01:23 000010100-000000003.tsm
-rw-r--r-- 1 root root 5690656 Nov 16 01:21 000010108-000000002.tsm
-rw-r--r-- 1 root root 6081352 Nov 16 01:22 000010116-000000002.tsm
-rw-r--r-- 1 root root 6690825 Nov 16 01:23 000010124-000000002.tsm
-rw-r--r-- 1 root root 775892 Nov 14 16:36 000010125-000000001.tsm
-rw-r--r-- 1 root root 780584 Nov 14 16:39 000010126-000000001.tsm
-rw-r--r-- 1 root root 327155712 Nov 16 20:21 000010126-000000002.tsm.tmp
-rw-r--r-- 1 root root 25684 Nov 8 12:16 fields.idx
drwxr-xr-x 10 root root 4096 Nov 16 00:06 index
$ influx_inspect dumptsm 000001280-000000005.tsm
Summary:
File: 000001280-000000005.tsm
Time Range: 2018-11-05T00:00:00Z - 2018-11-05T22:29:59.9Z
Duration: 22h29m59.9s Series: 1996 File Size: 2147486635
Statistics
Blocks:
Total: 1021620 Size: 2118552906 Min: 39 Max: 5866 Avg: 2073
Index:
Total: 1021620 Size: 28933716
Points:
Total: 1020426448
Encoding:
Timestamp: none: 0 (0%) s8b: 975933 (95%) rle: 45687 (4%)
Float: none: 0 (0%) gor: 1021620 (100%)
Compression:
Per block: 2.08 bytes/point
Total: 2.10 bytes/point
$ influx_inspect dumptsm 000010126-000000001.tsm
Summary:
File: 000010126-000000001.tsm
Time Range: 2018-11-11T19:00:00Z - 2018-11-11T19:29:59.9Z
Duration: 29m59.9s Series: 86 File Size: 780584
Statistics
Blocks:
Total: 1634 Size: 717326 Min: 30 Max: 1847 Avg: 439
Index:
Total: 1634 Size: 63245
Points:
Total: 1548086
Encoding:
Timestamp: none: 0 (0%) s8b: 516 (31%) rle: 1118 (68%)
Float: none: 0 (0%) gor: 1634 (100%)
Compression:
Per block: 0.46 bytes/point
Total: 0.50 bytes/point
$ influx_inspect dumptsm 000010126-000000002.tsm.tmp
dumptsm: Error opening TSM files: indirectIndex: not enough data for max time
The tsm.tmp file has indeed stopped growing in size:
$ date; ls -l 000010126-000000002.tsm.tmp
Fri Nov 16 20:51:50 UTC 2018
-rw-r--r-- 1 root root 327155712 Nov 16 20:21 000010126-000000002.tsm.tmp
$ date; ls -l 000010126-000000002.tsm.tmp
Fri Nov 16 20:52:25 UTC 2018
-rw-r--r-- 1 root root 327155712 Nov 16 20:21 000010126-000000002.tsm.tmp
I can upload this shard somewhere for you to reach it. I'll send the link to your gmail address once it is up.
Unable to say if the influxdb log had a completion entry for full_resolution_calculations/1005
. I've been restarting back and forth, and at some point I thought the shards were fixed except 1005, so I unfortunately sacrificed and deleted that shard already.
$ influx_inspect report 1001
DB RP Shard File Series New (est) Min Time Max Time Load Time
*redacted* full_resolution 1001 000001280-000000005.tsm 1996 1997 2018-11-05T00:00:00Z 2018-11-05T22:29:59.9Z 2.127496ms
*redacted* full_resolution 1001 000001280-000000006.tsm 3500 3494 2018-11-05T00:00:00Z 2018-11-06T00:29:59.975Z 114.600155ms
*redacted* full_resolution 1001 000001280-000000007.tsm 2350 2369 2018-11-05T00:00:00Z 2018-11-06T00:29:59.95Z 104.123384ms
*redacted* full_resolution 1001 000002176-000000005.tsm 2193 0 2018-11-05T22:00:00Z 2018-11-06T17:59:59.975Z 106.93131ms
*redacted* full_resolution 1001 000002176-000000006.tsm 4771 1 2018-11-05T22:00:00.05Z 2018-11-06T18:29:59.975Z 111.297607ms
*redacted* full_resolution 1001 000002176-000000007.tsm 937 60 2018-11-05T22:30:00.05Z 2018-11-06T18:29:59.975Z 81.33326ms
*redacted* full_resolution 1001 000003456-000000005.tsm 1921 0 2018-11-06T18:00:00.025Z 2018-11-07T16:59:59.975Z 101.851983ms
*redacted* full_resolution 1001 000003456-000000006.tsm 3111 2 2018-11-06T17:30:00.075Z 2018-11-07T16:59:59.975Z 110.449569ms
*redacted* full_resolution 1001 000003456-000000007.tsm 2810 0 2018-11-06T18:00:00Z 2018-11-07T16:59:59.975Z 94.000429ms
*redacted* full_resolution 1001 000004736-000000005.tsm 2543 547 2018-11-06T11:48:00.475Z 2018-11-08T14:59:59.95Z 104.866018ms
*redacted* full_resolution 1001 000004736-000000006.tsm 3019 0 2018-11-07T13:00:00.025Z 2018-11-08T15:29:59.975Z 97.934706ms
*redacted* full_resolution 1001 000004736-000000007.tsm 2913 88 2018-11-06T11:48:00.525Z 2018-11-08T15:29:59.975Z 126.143675ms
*redacted* full_resolution 1001 000006016-000000005.tsm 1936 0 2018-11-08T15:00:00.05Z 2018-11-09T13:59:59.975Z 89.927246ms
*redacted* full_resolution 1001 000006016-000000006.tsm 2810 0 2018-11-08T12:00:00Z 2018-11-09T13:59:59.975Z 91.292107ms
*redacted* full_resolution 1001 000006016-000000007.tsm 3092 0 2018-11-08T15:30:00Z 2018-11-09T13:59:59.975Z 61.211254ms
*redacted* full_resolution 1001 000007296-000000005.tsm 1912 0 2018-11-09T13:30:00.15Z 2018-11-10T13:29:59.95Z 43.394452ms
*redacted* full_resolution 1001 000007296-000000006.tsm 2843 0 2018-11-09T13:30:00Z 2018-11-10T13:29:59.975Z 95.845912ms
*redacted* full_resolution 1001 000007296-000000007.tsm 3082 0 2018-11-09T14:00:00Z 2018-11-10T13:29:59.95Z 45.762259ms
*redacted* full_resolution 1001 000008192-000000005.tsm 2175 0 2018-11-10T13:00:00Z 2018-11-11T09:59:59.8Z 36.166727ms
*redacted* full_resolution 1001 000008192-000000006.tsm 4163 0 2018-11-10T09:30:00.05Z 2018-11-11T10:29:59.975Z 94.093133ms
*redacted* full_resolution 1001 000008192-000000007.tsm 466 0 2018-11-10T10:00:00.05Z 2018-11-10T13:59:59.925Z 9.267417ms
*redacted* full_resolution 1001 000008704-000000005.tsm 4535 0 2018-11-10T12:30:00Z 2018-11-11T22:59:59.95Z 69.444225ms
*redacted* full_resolution 1001 000008704-000000006.tsm 2268 0 2018-11-10T14:00:00Z 2018-11-11T22:59:59.95Z 33.501507ms
*redacted* full_resolution 1001 000008833-000000004.tsm 6660 0 2018-11-10T14:00:00.025Z 2018-11-11T23:59:59.975Z 15.833312ms
*redacted* full_resolution 1001 000008962-000000004.tsm 857 0 2018-11-10T22:00:00.025Z 2018-11-11T17:59:59.925Z 27.914255ms
*redacted* full_resolution 1001 000009099-000000004.tsm 999 0 2018-11-11T05:30:00.05Z 2018-11-11T20:59:59.95Z 12.381657ms
*redacted* full_resolution 1001 000009233-000000004.tsm 2206 174 2018-11-10T10:00:00Z 2018-11-11T23:59:59.95Z 21.198654ms
*redacted* full_resolution 1001 000009361-000000004.tsm 1426 0 2018-11-10T12:30:00Z 2018-11-11T13:29:59.95Z 27.671145ms
*redacted* full_resolution 1001 000009490-000000004.tsm 1205 0 2018-11-10T17:00:00Z 2018-11-10T21:59:59.975Z 23.743869ms
*redacted* full_resolution 1001 000009618-000000004.tsm 1206 0 2018-11-10T21:30:00Z 2018-11-11T02:29:59.975Z 23.800471ms
*redacted* full_resolution 1001 000009748-000000004.tsm 1499 0 2018-11-11T02:00:00Z 2018-11-11T15:59:59.975Z 24.383997ms
*redacted* full_resolution 1001 000009876-000000004.tsm 1498 0 2018-11-11T06:30:00Z 2018-11-11T22:59:59.975Z 19.937697ms
*redacted* full_resolution 1001 000010004-000000004.tsm 1412 0 2018-11-11T10:00:00Z 2018-11-11T23:59:59.975Z 24.115485ms
*redacted* full_resolution 1001 000010036-000000003.tsm 1118 0 2018-11-11T14:30:00.025Z 2018-11-11T15:59:59.975Z 6.45649ms
*redacted* full_resolution 1001 000010068-000000003.tsm 1118 0 2018-11-11T16:00:00Z 2018-11-11T17:29:59.975Z 12.507463ms
*redacted* full_resolution 1001 000010100-000000003.tsm 1118 0 2018-11-11T17:00:00Z 2018-11-11T18:29:59.975Z 14.894271ms
*redacted* full_resolution 1001 000010108-000000002.tsm 516 0 2018-11-11T18:00:00.075Z 2018-11-11T18:59:59.975Z 13.749419ms
*redacted* full_resolution 1001 000010116-000000002.tsm 688 0 2018-11-11T18:30:00Z 2018-11-11T19:29:59.975Z 7.964258ms
*redacted* full_resolution 1001 000010124-000000002.tsm 688 0 2018-11-11T19:00:00Z 2018-11-11T19:29:59.975Z 6.293983ms
*redacted* full_resolution 1001 000010125-000000001.tsm 86 0 2018-11-11T19:00:00.025Z 2018-11-11T19:29:59.925Z 2.907131ms
*redacted* full_resolution 1001 000010126-000000001.tsm 86 0 2018-11-11T19:00:00Z 2018-11-11T19:29:59.9Z 166.908µs
Summary:
Files: 41
Time Range: 2018-11-05T00:00:00Z - 2018-11-11T23:59:59.975Z
Duration: 167h59m59.975s
Statistics
Series:
- *redacted* (est): 8732 (100%)
Total (est): 8732
Completed in 2.263066813s
Are you sure that you're running 1.7.1? Docker hub shows updates "an hour ago".
It is either that or maybe you are running into a different kind of problem. I'm running on bare metal and installed from rpm https://dl.influxdata.com/influxdb/releases/influxdb-1.7.1.x86_64.rpm
@conet have you had any issues since upgrading to 1.7.1?
@stuartcarnie no, but I just read the backlog and another difference besides packaging is that I haven't switched to tsi
yet, I'm using in memory
indexes. The cpu has been idling ever since the upgrade to 1.7.1
.
@conet glad to hear you are still stable. The index type was unrelated to this issue.
@hallvar Are you sure that the 4 stuck compaction processes amount to the used CPU? What I noticed that a single compaction process would only consume 1 core. Could you run this query on the internal database?
select /.*CompactionsActive/ from tsm1_engine where time > now() - 1m AND id = '1001' group by *
How about queries hitting the DB, aren't they causing the load?
The index type was unrelated to this issue.
@stuartcarnie I was trying to spot differences and that was the only thing I noticed, anyway I don't want to create more noise around this issue 😄
System info: InfluxDB 1.7, upgraded from 1.6.4. Running on the standard Docker image.
Steps to reproduce:
I upgraded a large InfluxDB server to InfluxDB 1.7. Nothing else changed. We are running two InfluxDB servers of a similar size, the other one was left at 1.6.4.
This ran fine for about a day, then it started running into our memory limit and continually OOMing. We upped the memory and restarted. It ran fine for about 4 hours then started using very high disk i/o (read) which caused our stats writing processes to back off.
Please see the timeline on the heap metrics below: