influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.93k stars 3.55k forks source link

InfluxDB 1.7 uses way more memory and disk i/o than 1.6.4 #10468

Closed scotloach closed 5 years ago

scotloach commented 5 years ago

System info: InfluxDB 1.7, upgraded from 1.6.4. Running on the standard Docker image.

Steps to reproduce:

I upgraded a large InfluxDB server to InfluxDB 1.7. Nothing else changed. We are running two InfluxDB servers of a similar size, the other one was left at 1.6.4.

This ran fine for about a day, then it started running into our memory limit and continually OOMing. We upped the memory and restarted. It ran fine for about 4 hours then started using very high disk i/o (read) which caused our stats writing processes to back off.

Please see the timeline on the heap metrics below:

image

scotloach commented 5 years ago

I've spent a lot more time trying to understand what's happening here. I've also downgraded back to 1.6.4 and the same thing happens - so I'm suspecting that after the upgrade, maybe one of the shards got corrupted in some way. What seems to be happening is, when InfluxDB starts up, everything is fine. At some time after this, the read I/O load suddenly gets maxed out, this doesn't ever seem to stop until InfluxDB is restarted. I'm attaching here a profile I took while it was in this state (this is 1.6.4). profiles.tar.gz

chenxu1990 commented 5 years ago

I have encountered the same problem. The read load continues to be high.

image

nktl commented 5 years ago

Same problem here, visible increase of memory usage after upgrade and performance of the queries dropped dramatically - the platform is borderline unusable at the moment... No other changes than upgrading version to 1.7

image

e-dard commented 5 years ago

@scotloach @cxhuawei @nktl thanks for your reports. We will prioritise an investigation into this issue today.

In the mean time, if you have any more information on your setups that would be very helpful. For example:

@scotloach thanks for your 1.6.4 profiles. When your downgraded back to 1.6.4 would you describe the system behaviour as being exactly the same as 1.7.0 (same heap usage, CPU and IO) or would you say there were differences in the performance characteristics?

I will look at your profiles this morning.

fntlnz commented 5 years ago

All the following comment was wrong, analyzed the profile using the wrong binary, corrected version here


Just for context, I analyzed a bit the heap profile that @scotloach provided, here's the top heap consuming calls.

Total: 88992.1 MB
 33940.0  38.1%  38.1%  33991.0  38.2% github.com/influxdata/influxdb/prometheus/remote..Unmarshal
 14590.3  16.4%  54.5%  14590.3  16.4% runtime.getitab
  7334.0   8.2%  62.8%   9060.1  10.2% runtime..sysAlloc
  4771.5   5.4%  68.1%   5255.3   5.9% runtime..Error
  2462.0   2.8%  70.9%   2462.0   2.8% github.com/influxdata/influxdb/query..Close
  2262.0   2.5%  73.4%   3940.5   4.4% text/tabwriter..append
  1790.8   2.0%  75.5%   4249.8   4.8% github.com/influxdata/influxdb/query..Next
  1662.8   1.9%  77.3%   1665.8   1.9% github.com/influxdata/influxdb/vendor/collectd.org/network.parseValues
  1508.5   1.7%  79.0%   1567.5   1.8% bytes..String
  1369.5   1.5%  80.6%  13209.1  14.8% type..eq.github.com/influxdata/influxdb/vendor/github.com/BurntSushi/toml.item

Attached the profile image too.

profile001

e-dard commented 5 years ago

Could you please also provide some data from the output of the following:

influx_inspect dumptsm -index path/to/tsm/file/in/shard/being/compacted/0000000xy-00000000z.tsm

You don't need to provide all the output (there could be a lot), we're just interested in seeing if the min time and max time are correct on the TSM index data.

e-dard commented 5 years ago

@fntlnz I'm not sure you're using the right binary. When I look at the heap profile I see all the heap being used by the decoders for a compaction:

Showing top 10 nodes out of 130
      flat  flat%   sum%        cum   cum%
   33.57GB 37.40% 37.40%    33.57GB 37.40%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
   19.43GB 21.65% 59.05%    27.74GB 30.91%  github.com/influxdata/influxdb/tsdb/index/tsi1.(*LogFile).execSeriesEntry
    7.07GB  7.88% 66.92%     7.79GB  8.68%  github.com/influxdata/influxdb/tsdb/index/tsi1.(*logTagValue).addSeriesID
    2.66GB  2.96% 69.89%     2.66GB  2.96%  github.com/influxdata/influxdb/pkg/rhh.(*HashMap).insert
    2.50GB  2.79% 72.68%     6.14GB  6.84%  github.com/influxdata/influxdb/tsdb/index/inmem.(*Index).CreateSeriesListIfNotExists
    2.24GB  2.49% 75.17%     2.24GB  2.49%  github.com/influxdata/influxdb/models.Tags.Clone
    1.92GB  2.14% 77.31%     6.01GB  6.69%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints
    1.63GB  1.82% 79.13%     1.63GB  1.82%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*entry).add
    1.57GB  1.75% 80.88%     3.24GB  3.61%  github.com/influxdata/influxdb/pkg/rhh.(*HashMap).grow
    1.30GB  1.44% 82.32%     5.52GB  6.15%  github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry

This issue looks very similar to the one we recently fixed, which is somewhat confusing.

nktl commented 5 years ago

Thanks!

Regarding your questions:

Around 10000

There are roughly 100 databases, the 'worst' one has carnality of around 40000

Getting 404 on this influx endpoint for some reason

e-dard commented 5 years ago

@nktl thanks. Do you have pprof-enabled = true in the http section of your config? It defaults to true, but maybe you disabled it?

nktl commented 5 years ago

@nktl thanks. Do you have pprof-enabled = true in the http section of your config? It defaults to true, but maybe you disabled it?

Yes, that was it! Unfortunately cannot restart influx at the moment to apply this, would have to be a bit later today or tomorrow.

fntlnz commented 5 years ago

Sorry for the confusion, uploading the profile output with the right binary from the downloads page.

profile002

and here's the top calls.

Dropped 549 nodes (cum <= 0.45GB)
Showing top 10 nodes out of 130
      flat  flat%   sum%        cum   cum%
   33.57GB 37.40% 37.40%    33.57GB 37.40%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
   19.43GB 21.65% 59.05%    27.74GB 30.91%  github.com/influxdata/influxdb/tsdb/index/tsi1.(*LogFile).execSeriesEntry
    7.07GB  7.88% 66.92%     7.79GB  8.68%  github.com/influxdata/influxdb/tsdb/index/tsi1.(*logTagValue).addSeriesID
    2.66GB  2.96% 69.89%     2.66GB  2.96%  github.com/influxdata/influxdb/pkg/rhh.(*HashMap).insert
    2.50GB  2.79% 72.68%     6.14GB  6.84%  github.com/influxdata/influxdb/tsdb/index/inmem.(*Index).CreateSeriesListIfNotExists
    2.24GB  2.49% 75.17%     2.24GB  2.49%  github.com/influxdata/influxdb/models.Tags.Clone
    1.92GB  2.14% 77.31%     6.01GB  6.69%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints
    1.63GB  1.82% 79.13%     1.63GB  1.82%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*entry).add
    1.57GB  1.75% 80.88%     3.24GB  3.61%  github.com/influxdata/influxdb/pkg/rhh.(*HashMap).grow
    1.30GB  1.44% 82.32%     5.52GB  6.15%  github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry
e-dard commented 5 years ago

@nktl ok thanks. In the meantime, if you can identify a shard that you think is being continually compacted do you think you could send it to us? We have a secure SFTP site for uploading it. Feel free to email me edd at the product name dot com. It may help us get to the bottom of the issue more quickly.

scotloach commented 5 years ago

For my system, I'm using tsi, about 6K shards, 70 databases. Largest DB has about 8M series.

I use 7 day shards for the most part so they all rolled over last evening, and it's been stable and looking good on the 1.64 release since then. I'm happy to help you out with your investigation if I can, but I'm not going to put 1.7 back on it anytime soon.

stuartcarnie commented 5 years ago

@e-dard, the following line using DecodeFloatBlock I would expect to be for query load, as compaction should be using the new DecodeFloatArrayBlock API

33.57GB 37.40% 37.40%    33.57GB 37.40%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatBlock
scotloach commented 5 years ago

Another thing I noticed, the current _internal.monitor shard was growing very fast when this was happening, it churned through about 250GB of disk before I noticed and deleted that shard. I'm not sure what it was and I needed to clean the space up.

stuartcarnie commented 5 years ago

That is a really great lead, @scotloach. Perhaps the behavior of the _internal monitoring database has changed.

scotloach commented 5 years ago

My instinct is more that something was happening (compaction I guess?) and got into some internal loop, this process was continually reading from disk and as a side effect was updating some state in the monitor db.

e-dard commented 5 years ago

@stuartcarnie The profiles are taken from a 1.6.4 release. @scotloach downgraded when they inititially saw problems.

e-dard commented 5 years ago

My current theory based on this ticket and #10465 are that there is a compaction bug somewhere, but it's not caused by a problem with encoding bad TSM data.

The symptoms seem to be when upgrading to 1.7.0 a full compaction seems to go on indefinitely. Profiles from this ticket, and #10465 indicate a significant amount of time decoding blocks and checking for overlaps, suggesting the routines are being called many time.

I'm awaiting logs from #10465 to see if there are any clues there.

scotloach commented 5 years ago

To clarify, downgrading to 1.6.4 and restarting did not make the issue go away. The issue seems to have gone away now that I'm writing to new shards created by 1.6.4. It's only been about 12 hours since I restarted it post shard-rollover, so it's possible it will happen again, I'm keeping a close eye on it.

stuartcarnie commented 5 years ago

Ahh right profile is 1.6.4, my mistake @e-dard

e-dard commented 5 years ago

@scotloach sorry yeah, I think the shard rolling over has something to do with too.

e-dard commented 5 years ago

@scotloach You should be able to tell by checking if the most recent cold shard (the shard that was problematic) is now completely cold. An ls -lah on that shard directory should provide information to help determine that.

scotloach commented 5 years ago

I'm not actually sure what the problematic shard was.

chenxu1990 commented 5 years ago

@e-dard Hi , Regarding your questions:

index-version = "tsi1"

and profiles

profiles.tar.gz

I use influxdb as promethues's storage. The write iops is normal but the read iops periodic surge.

image

When I query 30 days of data, Influxdb has a high probability of falling into high read load so that I can only restart influxdb.

image

thanks. : )

scotloach commented 5 years ago

I'm still running into problems with influxdb 1.6.4. I think I can identify problematic files now, if you still need some I can probably provide them.

v9n commented 5 years ago

In my case, after upgrading to influxdb 1.7, the CPU usage go crazy. I used telegraf and forward to 2 influxdb (1.6.4, and 1.7.0).

Read is done directly on 1.6.4. Mean 1.6.4 do more works, but its cpu usage is 0.04 while as the 1.7.0 constantly use full 2 cores(what I have on the system).

Any guide how I can collect some particular metrics to support debugging?

scotloach commented 5 years ago

There's a bug related to compaction with 1.7; something happens with current/new shards that causes the compaction process to get stuck and max out CPU and read I/O processing them.

Here is what I did to identify when this is happening: find /var/lib/influxdb -name '*.tsm.tmp'

This will return a list of files currently being compacted by influxdb. If you see the same files here for a long time, and if the size/modtime on these files is not changing, most likely it's in this state.

nktl commented 5 years ago

Our influx instance is almost unusable now with > 40 fully occupied CPU cores at the moment and in fact it looks like there are 'stuck' compaction tasks in telegraf database, timestamps of *tsm.tmp files have not changed since the upgrade a few days ago...

image

v9n commented 5 years ago

@scotloach The issues seems to be track down and PR is ready https://github.com/influxdata/influxdb/pull/10479

So cross finger for a new release soon.

hallvar commented 5 years ago

Some open questions:

I have 5 shards with a total of ~80 GB that is exhibiting this issue. Already downgraded to 1.6.4 after noticing the issue.

e-dard commented 5 years ago

Update on Issue

Hi everyone. Here is an update on the issue: we have a fix ready in #10479. It will very likely be merged in the coming hours. Thanks for your profiles and data. It really helped us track the (very subtle) issue down.

We introduced a bug here. The bug affects how runs of integers (timestamps are stored and compressed as integers in InfluxDB) are encoded with the simple8b encoding scheme.

This bug found its way into the 1.7.0 release (and master). We will be shipping 1.7.1 with a fix soon. Hoepfully today or tomorrow.

One user sent us their shard, totalling around 2.7GB of TSM data, and, with some tooling I wrote, I determined that < 0.00013% (~ 1 in 7600) of their blocks were affected.

The bug is quite difficult to trigger. The encoder would need to be given 119 or 239 continuous 1 values followed by another value (the 120th or 240th) that is not a 1. When this happens, unfortunately the 120th or 240th value would be encoded as a 1 and not as the true value.

Background

We use simple8b for two TSM data types in InfluxDB: integer field values, and timestamps. Any field values of other type (remember the default field value type is float), are unaffected. As you might imagine, it's unlikely that integer field values would follow the pattern needed to result in an incorrectly encoded 120th or 240th value in a run.

Timestamps though are more likely to trigger this bug, and the impact of them triggering this bug is a broken invariant in the compaction engine, which has what's been causing the compaction issues.

In Influx, timestamps are heavily compressed. We use a couple of encoding schemes: run length encoding and simple8b. The first thing we do for both encoding schemes is to reduce the set of timestamps we're encoding to deltas, that is, differences between each subsequent timestamp (we keep the first timestamp in its original format as the "base"). Next, we look to see if all of those deltas are the same. This would occur if we had perfectly spaced timestamps, i.e., a series of points with equally spaced timestamps were written to the shard. In that case we can use run-length encoding, storing the base value, the delta and how many times we need to repeat it. This scenarios means the afforementioned bug is not going to be triggered.

Sometimes though, data is not perfectly spaced. For example if you use Telegraf and it slows down or there is network traffic etc. In that case you might generally get evenly spaced timestamps, but then every so often your timestamp is one millisecond or second or unit interval off from where it should be. In this case we can't use RLE, so we use simple8b. First we look for a common divisor in the timestamp deltas. Since we store in nanoseconds, and users often send data every ms or second or 10 seconds, we often find the common divisor to the deltas is 1000, 10000, 100000 etc, and we can reduce out deltas even further to a stream of 1s with the occational 2 or 3 where timestamps are slightly late in arriving (these are the stray timestamps).

Now, if that stray timestamp (where the delta is not 1) happens to be the 120th or 240th timestamp in a run of timestamps, then the timestamp is unfortunately going to be incorrectly encoded. But, since we're encoding the deltas of the timestamps and not the timestamps themseleves, then the error is small, and the timestamp is encoded close to its true value. In the shard we examined, typically the timestamps were encoded 1ms away from where they were supposed to be.

Impact on compaction

If the timestamp that has been incorrectly encoded is the last timestamp in a block, then the block's entry in the TSM index (at the end of each TSM file) is going to disagree with the block about what the max-time of the block is. In the problem cases we examined in the provided shard, the TSM index would say the max time for the block is x, and the final timestamp in the block would be x - 1ms.

Due to this, the compaction engine would get stuck because it kept trying to read more of the block upto the max timestamp, but there was no matching timestamp data in the block. Essentially it would get to the end of the block and be unable to read a timestamp of x.

Because the bug resulted in some incorrect timestamps being encoded into some TSM blocks, rolling back to 1.6.4 would not fix this issue.

The Fix

Aside from actually ensuring the correct encoding of values using simple8b, the fix in #10479 also detects when this bug has impacted the final timestamp in a block, and corrects the timestamp by using the TSM index entry as the source of truth. The entry was stored before the timestamp was encoded into the block, so the result is the timestamp is corrected properly.

We have tested this on the problem shard provided, and we're also testing on some internal system.

Questions

@hallvar your questions:

Will the fix presented in the PR fix shards that have already been corrupted?

The PR will fix blocks where the final timestamp value was incorrectly encoded, by replacing it with the max time from the TSM index entry. This will correct the timestamp.

It is possible that other timestamps may be incorrectly encoded if they are the 120th/240th timestamp in a run of 120/240 perfectly spaced timestamps, and the 120th/240th timestamp does not follow the perfect spacing (i.e., it is "slow" or "fast"). In that case, the erroneous timestamp would be incorrectly encoded according to the "perfect" run so to speak (it would be adjusted by the incorrect delta).

Is there anything that can be done to fix the shards that have gone into a compaction loop?

A couple of suggestions: (1) wait for 1.7.1 - we're hopeful to release it today or tomorrow. (2) run tomorrow's nightly. Please note, nightlies are not intended to be used on production loads. (3) if you have enough CPU cores, reduce max-concurrent-compactions to a low number so that you don't tie up all your cores. (4) if you have a backup you're willing to restore from, then you can restore it and run 1.6.4 until 1.7.1 is released.

Or can we only wait for the shard duration to be reached?

The final full compaction could still trigger this.

Is the data corrupted? I.e. should I delete the shards and restore from backup?

Max timestamps can be corrupted but will be correctly restored in 1.7.1 by using the TSM index. It is possible, but we believe unlikely that other timestamps are corrupted. In the case that they are corupted, it is likely they are corrupted in a way that adjusts them slightly, e.g., by 1ms, second etc, depending on your timestamp intervals between points.

If you have a backup then certainly you can restore and run 1.6.4 until 1.7.1 is released.

hallvar commented 5 years ago

Thank you for taking the time to explain the issue and answering my questions, highly appreciated! Awaiting 1.7.1 and will post the results here.

conet commented 5 years ago

About the uneven time distribution which prevents RLE and forces simple8b, it's not necessarily the fault of telegraf, we also collect httpd access log information using the logparser plugin, each collected value (request size, response size, response time) has its unique timestamp (as logged by the service), that might explain why the second instance of this issue (which happened after I extracted some measurements in different databases) was in a shard in the logs database.

conet commented 5 years ago

I just installed 1.7.1 everything went back to normal within a minute after startup.

e-dard commented 5 years ago

@scotloach @cxhuawei @nktl @v9n @hallvar 1.7.1 is now available. Please upgrade at your earliest convenience. I'm going to close this issue, but will re-open if there are continued problems.

brava-vinh commented 5 years ago

@e-dard Thanks you. When will the docker image 1.7.1 available on dockerhub?

e-dard commented 5 years ago

@brava-vinh we have submitted a PR for the docker hub image to be updated

hallvar commented 5 years ago

Running 1.7.1 since ~Nov 16 02:00:00 UTC 2018 (from the official docker image) but still having .tsm.tmp files that seem stuck:

$ date; sudo find . -name "*.tsm.tmp"|xargs sudo ls -lh 
Fri Nov 16 18:48:01 UTC 2018
-rw-r--r-- 1 root root 523M Nov 16 09:41 ./asd/downsampled/1002/000013067-000000004.tsm.tmp
-rw-r--r-- 1 root root 312M Nov 16 09:41 ./asd/full_resolution/1001/000010126-000000002.tsm.tmp
-rw-r--r-- 1 root root    0 Nov 16 09:41 ./asd/full_resolution_calculations/1005/000008481-000000003.tsm.tmp
-rw-r--r-- 1 root root    0 Nov 16 09:41 ./asd/full_resolution_calculations/1030/000000033-000000003.tsm.tmp

Exact same result at Fri Nov 16 19:18:38 UTC 2018

The shard sizes are ./asd/full_resolution/1001/: 41 GB. ./asd/downsampled/1002/: 24 GB. ./asd/full_resolution_calculations/1005/: 1.6 GB. ./asd/full_resolution_calculations/1030/: 929 MB

The container was manually restarted at 2018-11-16T09:36:13.006788691Z to set max-concurrent-compactions to 4. It has been running since then.

I don't really know if the files behave as expected or not. However InfluxDB is heavily using all 16 CPU cores.

screenshot 2018-11-16 at 20 04 02

Could it be that this is a corrupt shard from 1.7.0 that is not being fixed by 1.7.1?

My history has been:

  1. Running 1.6.4.
  2. Stop 1.6.4
  3. Run buildtsi using 1.7.0 client
  4. Run on 1.7.0
  5. Major issues with CPU/mem/io
  6. Downgrade to 1.6.4 (not rebuilding indexes)
  7. Set max_concurrent_compactions to 1
  8. Run for ~1-2 days
  9. Stop 1.6.4
  10. Run buildtsi using 1.7.1 client
  11. Run on 1.7.1 with max_concurrent_compactions=0
  12. Running for a few hours, noticing reduced performance/high CPU/high IO
  13. Restart 1.7.1 with max_concurrent_compactions=4
  14. Still noticing reduced performance
  15. File this report :-)

I did a curl -o profile6.pprof http://localhost:8086/debug/pprof/profile

and got the results:

go tool pprof -top influxd-1.7.1 profile6.pprof


File: influxd-1.7.1
Type: cpu
Time: Nov 16, 2018 at 8:27pm (CET)
Duration: 30.17s, Total samples = 96.34s (319.31%)
Showing nodes accounting for 87.61s, 90.94% of 96.34s total
Dropped 240 nodes (cum <= 0.48s)
      flat  flat%   sum%        cum   cum%
    11.01s 11.43% 11.43%     11.01s 11.43%  strconv.ParseUint
     5.65s  5.86% 17.29%     18.84s 19.56%  github.com/influxdata/influxdb/tsdb/engine/tsm1.FloatArrayDecodeAll
     5.42s  5.63% 22.92%      5.42s  5.63%  runtime.memclrNoHeapPointers
     4.96s  5.15% 28.07%      6.18s  6.41%  path/filepath.Base
     4.59s  4.76% 32.83%      9.75s 10.12%  runtime.scanobject
     3.61s  3.75% 36.58%     26.44s 27.44%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DefaultParseFileName
     3.58s  3.72% 40.29%      4.05s  4.20%  runtime.(*mheap).freeSpanLocked
     3.10s  3.22% 43.51%     13.11s 13.61%  runtime.sweepone
     2.85s  2.96% 46.47%      2.85s  2.96%  indexbytebody
     2.84s  2.95% 49.42%      3.18s  3.30%  runtime.findObject
     2.83s  2.94% 52.36%      2.83s  2.94%  runtime.futex
     2.36s  2.45% 54.81%      5.64s  5.85%  strings.Index
     2.12s  2.20% 57.01%     13.34s 13.85%  github.com/influxdata/influxdb/tsdb/engine/tsm1.timeBatchDecodeAllSimple
     1.90s  1.97% 58.98%      5.53s  5.74%  runtime.lock
     1.89s  1.96% 60.94%        33s 34.25%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmGeneration).level
     1.79s  1.86% 62.80%      1.79s  1.86%  runtime.procyield
     1.71s  1.77% 64.57%     28.15s 29.22%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).ParseFileName
     1.61s  1.67% 66.24%     29.76s 30.89%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).ParseFileName
     1.45s  1.51% 67.75%      3.32s  3.45%  runtime.unlock
     1.35s  1.40% 69.15%     31.11s 32.29%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).ParseFileName-fm
     1.32s  1.37% 70.52%     34.27s 35.57%  github.com/influxdata/influxdb/tsdb/engine/tsm1.tsmGenerations.level
     1.22s  1.27% 71.79%      1.22s  1.27%  os.IsPathSeparator (inline)
     1.20s  1.25% 73.03%      1.20s  1.25%  runtime.(*mSpanList).remove
     1.10s  1.14% 74.17%      1.10s  1.14%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*block).overlapsTimeRange (inline)
     1.07s  1.11% 75.29%      9.96s 10.34%  runtime.(*mspan).sweep
     1.02s  1.06% 76.34%     19.04s 19.76%  runtime.(*mcentral).cacheSpan
     0.98s  1.02% 77.36%      3.51s  3.64%  github.com/influxdata/influxdb/pkg/encoding/simple8b.DecodeBytesBigEndian
     0.95s  0.99% 78.35%      0.95s  0.99%  runtime.(*gcBitsArena).tryAlloc (inline)
     0.93s  0.97% 79.31%      0.93s  0.97%  runtime.osyield
     0.90s  0.93% 80.25%      0.90s  0.93%  runtime.memmove
     0.88s  0.91% 81.16%      1.84s  1.91%  runtime.greyobject
     0.76s  0.79% 81.95%      0.76s  0.79%  runtime.markBits.isMarked (inline)
     0.75s  0.78% 82.73%      0.75s  0.78%  github.com/influxdata/influxdb/tsdb/engine/tsm1.ZigZagDecode (inline)
     0.71s  0.74% 83.46%      0.85s  0.88%  runtime.(*gcSweepBuf).pop (inline)
     0.70s  0.73% 84.19%      6.34s  6.58%  github.com/influxdata/influxdb/tsdb/engine/tsm1.integerBatchDecodeAllSimple
     0.70s  0.73% 84.92%      0.70s  0.73%  runtime.(*fixalloc).alloc
     0.70s  0.73% 85.64%      7.46s  7.74%  runtime.(*mcentral).freeSpan
     0.56s  0.58% 86.23%      0.56s  0.58%  github.com/influxdata/influxdb/pkg/encoding/simple8b.unpack240
     0.56s  0.58% 86.81%     29.40s 30.52%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineFloat
     0.52s  0.54% 87.35%     23.16s 24.04%  runtime.mallocgc
     0.51s  0.53% 87.88%     13.20s 13.70%  runtime.growslice
     0.40s  0.42% 88.29%      5.54s  5.75%  runtime.(*mheap).alloc_m
     0.39s   0.4% 88.70%         2s  2.08%  runtime.(*mheap).allocSpanLocked
     0.37s  0.38% 89.08%      0.53s  0.55%  runtime.markrootSpans
     0.21s  0.22% 89.30%      0.51s  0.53%  runtime.gcmarknewobject
     0.17s  0.18% 89.47%      1.27s  1.32%  runtime.newMarkBits
     0.15s  0.16% 89.63%      0.49s  0.51%  runtime.gentraceback
     0.14s  0.15% 89.78%      4.90s  5.09%  runtime.deductSweepCredit
     0.13s  0.13% 89.91%      0.82s  0.85%  runtime.heapBits.initSpan
     0.12s  0.12% 90.04%      9.45s  9.81%  runtime.(*mcentral).grow
     0.11s  0.11% 90.15%      5.98s  6.21%  runtime.(*mheap).freeSpan.func1
     0.07s 0.073% 90.22%     27.55s 28.60%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatArrayBlock
     0.07s 0.073% 90.29%     19.26s 19.99%  runtime.(*mcache).nextFree
     0.07s 0.073% 90.37%     10.10s 10.48%  runtime.gcDrain
     0.06s 0.062% 90.43%     11.63s 12.07%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineInteger
     0.06s 0.062% 90.49%      0.55s  0.57%  runtime.schedule
     0.05s 0.052% 90.54%     13.39s 13.90%  github.com/influxdata/influxdb/tsdb/engine/tsm1.TimeArrayDecodeAll
     0.05s 0.052% 90.60%     19.09s 19.82%  runtime.(*mcache).refill
     0.05s 0.052% 90.65%     12.29s 12.76%  runtime.makeslice
     0.05s 0.052% 90.70%     37.77s 39.20%  runtime.systemstack
     0.04s 0.042% 90.74%     19.13s 19.86%  runtime.(*mcache).nextFree.func1
     0.03s 0.031% 90.77%      6.37s  6.61%  github.com/influxdata/influxdb/tsdb/engine/tsm1.IntegerArrayDecodeAll
     0.03s 0.031% 90.80%      0.98s  1.02%  runtime.futexsleep
     0.03s 0.031% 90.83%     12.19s 12.65%  runtime.gosweepone
     0.02s 0.021% 90.86%      1.90s  1.97%  runtime.futexwakeup
     0.02s 0.021% 90.88%     12.16s 12.62%  runtime.gosweepone.func1
     0.01s  0.01% 90.89%     25.43s 26.40%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanLevel
     0.01s  0.01% 90.90%     11.46s 11.90%  github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeIntegerArrayBlock
     0.01s  0.01% 90.91%      8.49s  8.81%  runtime.(*mheap).alloc
     0.01s  0.01% 90.92%      5.55s  5.76%  runtime.(*mheap).alloc.func1
     0.01s  0.01% 90.93%      5.99s  6.22%  runtime.(*mheap).freeSpan
     0.01s  0.01% 90.94%      0.53s  0.55%  runtime.mProf_Malloc
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles
         0     0% 90.94%      9.07s  9.41%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanOptimize
         0     0% 90.94%     34.57s 35.88%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compact
         0     0% 90.94%     21.16s 21.96%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull.func1
         0     0% 90.94%     19.87s 20.62%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactHiPriorityLevel.func1
         0     0% 90.94%     34.57s 35.88%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableLevelCompactions.func1
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).Apply
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).compactGroup
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).Next
         0     0% 90.94%     41.03s 42.59%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).merge
         0     0% 90.94%     29.40s 30.52%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeFloat
         0     0% 90.94%     11.63s 12.07%  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeInteger
         0     0% 90.94%      1.07s  1.11%  runtime.(*mheap).reclaim
         0     0% 90.94%      7.44s  7.72%  runtime.bgsweep
         0     0% 90.94%     10.12s 10.50%  runtime.gcBgMarkWorker
         0     0% 90.94%     10.09s 10.47%  runtime.gcBgMarkWorker.func2
         0     0% 90.94%      0.70s  0.73%  runtime.goschedImpl
         0     0% 90.94%      0.69s  0.72%  runtime.gosched_m
         0     0% 90.94%      0.62s  0.64%  runtime.markroot
         0     0% 90.94%      0.75s  0.78%  runtime.mcall
         0     0% 90.94%      0.55s  0.57%  runtime.profilealloc

go tool pprof -gif influxd-1.7.1 profile6.pprof gives me

profile002

Is there anything else I can provide here or on e-mail to help sort this issue?

v9n commented 5 years ago

I see same thing with @hallvar . I ended up just remove all db since it's new DB and I was doing double write from 1.6.4 amd it good since then.

hallvar commented 5 years ago

Indeed from the log of InfluxDB I can see an entry for starting the compaction process of ./asd/full_resolution/1001/, but never an entry for it completing.

stuartcarnie commented 5 years ago

@hallvar few questions:

stuartcarnie commented 5 years ago

@hallvar also, do you see a log entry for a compaction for full_resolution_calculations/1005 that starts but does not stop?

hallvar commented 5 years ago

Some more details about full_resolution/1001

$ ls -l
total 42339144
-rw-r--r--  1 root root 2147486635 Nov  6 00:47 000001280-000000005.tsm
-rw-r--r--  1 root root 2147488291 Nov  6 00:49 000001280-000000006.tsm
-rw-r--r--  1 root root 1364751604 Nov  6 00:50 000001280-000000007.tsm
-rw-r--r--  1 root root 2147488268 Nov  6 18:39 000002176-000000005.tsm
-rw-r--r--  1 root root 2147488779 Nov  6 18:42 000002176-000000006.tsm
-rw-r--r--  1 root root  537029393 Nov  6 18:43 000002176-000000007.tsm
-rw-r--r--  1 root root 2147486794 Nov  7 17:26 000003456-000000005.tsm
-rw-r--r--  1 root root 2147487880 Nov  7 17:27 000003456-000000006.tsm
-rw-r--r--  1 root root 1487047140 Nov  7 17:28 000003456-000000007.tsm
-rw-r--r--  1 root root 2147486948 Nov  8 15:41 000004736-000000005.tsm
-rw-r--r--  1 root root 2147487245 Nov  8 15:42 000004736-000000006.tsm
-rw-r--r--  1 root root 1514402097 Nov  8 15:43 000004736-000000007.tsm
-rw-r--r--  1 root root 2147485943 Nov  9 14:25 000006016-000000005.tsm
-rw-r--r--  1 root root 2147486486 Nov  9 14:26 000006016-000000006.tsm
-rw-r--r--  1 root root 1630170600 Nov  9 14:28 000006016-000000007.tsm
-rw-r--r--  1 root root 2147486552 Nov 10 13:48 000007296-000000005.tsm
-rw-r--r--  1 root root 2147486659 Nov 10 13:49 000007296-000000006.tsm
-rw-r--r--  1 root root 1552881210 Nov 10 13:51 000007296-000000007.tsm
-rw-r--r--  1 root root 2147488256 Nov 11 10:34 000008192-000000005.tsm
-rw-r--r--  1 root root 2147489347 Nov 11 10:37 000008192-000000006.tsm
-rw-r--r--  1 root root   74587847 Nov 11 10:37 000008192-000000007.tsm
-rw-r--r--  1 root root 2147489288 Nov 11 23:28 000008704-000000005.tsm
-rw-r--r--  1 root root  463276511 Nov 11 23:29 000008704-000000006.tsm
-rw-r--r--  1 root root  633100657 Nov 12 02:15 000008833-000000004.tsm
-rw-r--r--  1 root root  470640594 Nov 12 06:50 000008962-000000004.tsm
-rw-r--r--  1 root root  139974092 Nov 12 12:36 000009099-000000004.tsm
-rw-r--r--  1 root root  156140475 Nov 13 17:45 000009233-000000004.tsm
-rw-r--r--  1 root root  110167094 Nov 13 17:45 000009361-000000004.tsm
-rw-r--r--  1 root root  111120955 Nov 14 02:01 000009490-000000004.tsm
-rw-r--r--  1 root root  108211921 Nov 14 02:01 000009618-000000004.tsm
-rw-r--r--  1 root root  118685885 Nov 16 01:22 000009748-000000004.tsm
-rw-r--r--  1 root root  134496994 Nov 16 01:23 000009876-000000004.tsm
-rw-r--r--  1 root root  109944892 Nov 16 01:25 000010004-000000004.tsm
-rw-r--r--  1 root root   25862631 Nov 16 01:22 000010036-000000003.tsm
-rw-r--r--  1 root root   26852524 Nov 16 01:22 000010068-000000003.tsm
-rw-r--r--  1 root root   26196516 Nov 16 01:23 000010100-000000003.tsm
-rw-r--r--  1 root root    5690656 Nov 16 01:21 000010108-000000002.tsm
-rw-r--r--  1 root root    6081352 Nov 16 01:22 000010116-000000002.tsm
-rw-r--r--  1 root root    6690825 Nov 16 01:23 000010124-000000002.tsm
-rw-r--r--  1 root root     775892 Nov 14 16:36 000010125-000000001.tsm
-rw-r--r--  1 root root     780584 Nov 14 16:39 000010126-000000001.tsm
-rw-r--r--  1 root root  327155712 Nov 16 20:21 000010126-000000002.tsm.tmp
-rw-r--r--  1 root root      25684 Nov  8 12:16 fields.idx
drwxr-xr-x 10 root root       4096 Nov 16 00:06 index
$ influx_inspect dumptsm 000001280-000000005.tsm
Summary:
  File: 000001280-000000005.tsm
  Time Range: 2018-11-05T00:00:00Z - 2018-11-05T22:29:59.9Z
  Duration: 22h29m59.9s   Series: 1996   File Size: 2147486635

Statistics
  Blocks:
    Total: 1021620 Size: 2118552906 Min: 39 Max: 5866 Avg: 2073
  Index:
    Total: 1021620 Size: 28933716
  Points:
    Total: 1020426448
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 975933 (95%)   rle: 45687 (4%) 
    Float:  none: 0 (0%)    gor: 1021620 (100%) 
  Compression:
    Per block: 2.08 bytes/point
    Total: 2.10 bytes/point
$ influx_inspect dumptsm 000010126-000000001.tsm
Summary:
  File: 000010126-000000001.tsm
  Time Range: 2018-11-11T19:00:00Z - 2018-11-11T19:29:59.9Z
  Duration: 29m59.9s   Series: 86   File Size: 780584

Statistics
  Blocks:
    Total: 1634 Size: 717326 Min: 30 Max: 1847 Avg: 439
  Index:
    Total: 1634 Size: 63245
  Points:
    Total: 1548086
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 516 (31%)  rle: 1118 (68%) 
    Float:  none: 0 (0%)    gor: 1634 (100%) 
  Compression:
    Per block: 0.46 bytes/point
    Total: 0.50 bytes/point
$ influx_inspect dumptsm 000010126-000000002.tsm.tmp
dumptsm: Error opening TSM files: indirectIndex: not enough data for max time

The tsm.tmp file has indeed stopped growing in size:

$ date; ls -l 000010126-000000002.tsm.tmp
Fri Nov 16 20:51:50 UTC 2018
-rw-r--r-- 1 root root 327155712 Nov 16 20:21 000010126-000000002.tsm.tmp

$ date; ls -l 000010126-000000002.tsm.tmp
Fri Nov 16 20:52:25 UTC 2018
-rw-r--r-- 1 root root 327155712 Nov 16 20:21 000010126-000000002.tsm.tmp

I can upload this shard somewhere for you to reach it. I'll send the link to your gmail address once it is up.

Unable to say if the influxdb log had a completion entry for full_resolution_calculations/1005. I've been restarting back and forth, and at some point I thought the shards were fixed except 1005, so I unfortunately sacrificed and deleted that shard already.

hallvar commented 5 years ago
$ influx_inspect report 1001
DB      RP              Shard   File                    Series  New (est) Min Time                 Max Time                 Load Time
*redacted*     full_resolution 1001    000001280-000000005.tsm 1996    1997      2018-11-05T00:00:00Z     2018-11-05T22:29:59.9Z   2.127496ms
*redacted*     full_resolution 1001    000001280-000000006.tsm 3500    3494      2018-11-05T00:00:00Z     2018-11-06T00:29:59.975Z 114.600155ms
*redacted*     full_resolution 1001    000001280-000000007.tsm 2350    2369      2018-11-05T00:00:00Z     2018-11-06T00:29:59.95Z  104.123384ms
*redacted*     full_resolution 1001    000002176-000000005.tsm 2193    0         2018-11-05T22:00:00Z     2018-11-06T17:59:59.975Z 106.93131ms
*redacted*     full_resolution 1001    000002176-000000006.tsm 4771    1         2018-11-05T22:00:00.05Z  2018-11-06T18:29:59.975Z 111.297607ms
*redacted*     full_resolution 1001    000002176-000000007.tsm 937     60        2018-11-05T22:30:00.05Z  2018-11-06T18:29:59.975Z 81.33326ms
*redacted*     full_resolution 1001    000003456-000000005.tsm 1921    0         2018-11-06T18:00:00.025Z 2018-11-07T16:59:59.975Z 101.851983ms
*redacted*     full_resolution 1001    000003456-000000006.tsm 3111    2         2018-11-06T17:30:00.075Z 2018-11-07T16:59:59.975Z 110.449569ms
*redacted*     full_resolution 1001    000003456-000000007.tsm 2810    0         2018-11-06T18:00:00Z     2018-11-07T16:59:59.975Z 94.000429ms
*redacted*     full_resolution 1001    000004736-000000005.tsm 2543    547       2018-11-06T11:48:00.475Z 2018-11-08T14:59:59.95Z  104.866018ms
*redacted*     full_resolution 1001    000004736-000000006.tsm 3019    0         2018-11-07T13:00:00.025Z 2018-11-08T15:29:59.975Z 97.934706ms
*redacted*     full_resolution 1001    000004736-000000007.tsm 2913    88        2018-11-06T11:48:00.525Z 2018-11-08T15:29:59.975Z 126.143675ms
*redacted*     full_resolution 1001    000006016-000000005.tsm 1936    0         2018-11-08T15:00:00.05Z  2018-11-09T13:59:59.975Z 89.927246ms
*redacted*     full_resolution 1001    000006016-000000006.tsm 2810    0         2018-11-08T12:00:00Z     2018-11-09T13:59:59.975Z 91.292107ms
*redacted*     full_resolution 1001    000006016-000000007.tsm 3092    0         2018-11-08T15:30:00Z     2018-11-09T13:59:59.975Z 61.211254ms
*redacted*     full_resolution 1001    000007296-000000005.tsm 1912    0         2018-11-09T13:30:00.15Z  2018-11-10T13:29:59.95Z  43.394452ms
*redacted*     full_resolution 1001    000007296-000000006.tsm 2843    0         2018-11-09T13:30:00Z     2018-11-10T13:29:59.975Z 95.845912ms
*redacted*     full_resolution 1001    000007296-000000007.tsm 3082    0         2018-11-09T14:00:00Z     2018-11-10T13:29:59.95Z  45.762259ms
*redacted*     full_resolution 1001    000008192-000000005.tsm 2175    0         2018-11-10T13:00:00Z     2018-11-11T09:59:59.8Z   36.166727ms
*redacted*     full_resolution 1001    000008192-000000006.tsm 4163    0         2018-11-10T09:30:00.05Z  2018-11-11T10:29:59.975Z 94.093133ms
*redacted*     full_resolution 1001    000008192-000000007.tsm 466     0         2018-11-10T10:00:00.05Z  2018-11-10T13:59:59.925Z 9.267417ms
*redacted*     full_resolution 1001    000008704-000000005.tsm 4535    0         2018-11-10T12:30:00Z     2018-11-11T22:59:59.95Z  69.444225ms
*redacted*     full_resolution 1001    000008704-000000006.tsm 2268    0         2018-11-10T14:00:00Z     2018-11-11T22:59:59.95Z  33.501507ms
*redacted*     full_resolution 1001    000008833-000000004.tsm 6660    0         2018-11-10T14:00:00.025Z 2018-11-11T23:59:59.975Z 15.833312ms
*redacted*     full_resolution 1001    000008962-000000004.tsm 857     0         2018-11-10T22:00:00.025Z 2018-11-11T17:59:59.925Z 27.914255ms
*redacted*     full_resolution 1001    000009099-000000004.tsm 999     0         2018-11-11T05:30:00.05Z  2018-11-11T20:59:59.95Z  12.381657ms
*redacted*     full_resolution 1001    000009233-000000004.tsm 2206    174       2018-11-10T10:00:00Z     2018-11-11T23:59:59.95Z  21.198654ms
*redacted*     full_resolution 1001    000009361-000000004.tsm 1426    0         2018-11-10T12:30:00Z     2018-11-11T13:29:59.95Z  27.671145ms
*redacted*     full_resolution 1001    000009490-000000004.tsm 1205    0         2018-11-10T17:00:00Z     2018-11-10T21:59:59.975Z 23.743869ms
*redacted*     full_resolution 1001    000009618-000000004.tsm 1206    0         2018-11-10T21:30:00Z     2018-11-11T02:29:59.975Z 23.800471ms
*redacted*     full_resolution 1001    000009748-000000004.tsm 1499    0         2018-11-11T02:00:00Z     2018-11-11T15:59:59.975Z 24.383997ms
*redacted*     full_resolution 1001    000009876-000000004.tsm 1498    0         2018-11-11T06:30:00Z     2018-11-11T22:59:59.975Z 19.937697ms
*redacted*     full_resolution 1001    000010004-000000004.tsm 1412    0         2018-11-11T10:00:00Z     2018-11-11T23:59:59.975Z 24.115485ms
*redacted*     full_resolution 1001    000010036-000000003.tsm 1118    0         2018-11-11T14:30:00.025Z 2018-11-11T15:59:59.975Z 6.45649ms
*redacted*     full_resolution 1001    000010068-000000003.tsm 1118    0         2018-11-11T16:00:00Z     2018-11-11T17:29:59.975Z 12.507463ms
*redacted*     full_resolution 1001    000010100-000000003.tsm 1118    0         2018-11-11T17:00:00Z     2018-11-11T18:29:59.975Z 14.894271ms
*redacted*     full_resolution 1001    000010108-000000002.tsm 516     0         2018-11-11T18:00:00.075Z 2018-11-11T18:59:59.975Z 13.749419ms
*redacted*     full_resolution 1001    000010116-000000002.tsm 688     0         2018-11-11T18:30:00Z     2018-11-11T19:29:59.975Z 7.964258ms
*redacted*     full_resolution 1001    000010124-000000002.tsm 688     0         2018-11-11T19:00:00Z     2018-11-11T19:29:59.975Z 6.293983ms
*redacted*     full_resolution 1001    000010125-000000001.tsm 86      0         2018-11-11T19:00:00.025Z 2018-11-11T19:29:59.925Z 2.907131ms
*redacted*     full_resolution 1001    000010126-000000001.tsm 86      0         2018-11-11T19:00:00Z     2018-11-11T19:29:59.9Z   166.908µs

Summary:
  Files: 41
  Time Range: 2018-11-05T00:00:00Z - 2018-11-11T23:59:59.975Z
  Duration: 167h59m59.975s 

Statistics
  Series:
     - *redacted* (est): 8732 (100%)
  Total (est): 8732
Completed in 2.263066813s
conet commented 5 years ago

Are you sure that you're running 1.7.1? Docker hub shows updates "an hour ago".

dockerhub updates

It is either that or maybe you are running into a different kind of problem. I'm running on bare metal and installed from rpm https://dl.influxdata.com/influxdb/releases/influxdb-1.7.1.x86_64.rpm

stuartcarnie commented 5 years ago

@conet have you had any issues since upgrading to 1.7.1?

conet commented 5 years ago

@stuartcarnie no, but I just read the backlog and another difference besides packaging is that I haven't switched to tsi yet, I'm using in memory indexes. The cpu has been idling ever since the upgrade to 1.7.1.

stuartcarnie commented 5 years ago

@conet glad to hear you are still stable. The index type was unrelated to this issue.

conet commented 5 years ago

@hallvar Are you sure that the 4 stuck compaction processes amount to the used CPU? What I noticed that a single compaction process would only consume 1 core. Could you run this query on the internal database?

select /.*CompactionsActive/ from tsm1_engine where time > now() - 1m AND id = '1001' group by *

How about queries hitting the DB, aren't they causing the load?

The index type was unrelated to this issue.

@stuartcarnie I was trying to spot differences and that was the only thing I noticed, anyway I don't want to create more noise around this issue 😄