influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.83k stars 3.55k forks source link

TSI index version is ignored #10035

Closed NocturnalShadow closed 5 years ago

NocturnalShadow commented 6 years ago

Hello, I've changed index_version property under datasection in influxdb.conf from inmemto tsi1. Then I've cleared meta, data and wal directories and restarted InfluxDB.

Expected behavior: InfluxDB uses tsi1 engine.

Actual behavior: tsm1 engine is still used. tsi1 option is ignored.

That is what I get, when I insert a data-point: image

This is crucial, since in the real environment, the whole index is loaded into RAM and takes 14+ GB which is deadly in my case.

I'm using InfluxDB 1.5.4, but I had the same issues with the earlier versions.

Here is a related issues: https://github.com/influxdata/influxdb/issues/9817, https://github.com/influxdata/influxdb/issues/9540. People there have similar problem, but by some reason, nobody answered, why tsm1 is still used if tsi1 was specified.

Thank you

NocturnalShadow commented 6 years ago

OK, here is the deal: It seem that despite of no mentioning of TSI indexing in logs, the InfluxDB will actually use TSI indexing if the following criteria met: [data] -> index_version set to tsi1 and all index directories recursively deleted under influxdb/data after successfully running influx_inspect buildtsi ...

And the reason I can know TSI is used instead of in-mem is that in my environment InfluxDB RAM usage reduced from >15GB to stable ~6GB. However, disc I/O was boosted immensely (docker stats shows 5GB / 380GB (I/O) for InfluxDB container). Also some queries became significantly slower and some will just fall out with Go-lang panic.

phmaes commented 6 years ago

The engine mentioned in your logs is the storage engine. It is composed of a number of components that each serve a particular role.

From http://docs.influxdata.com/influxdb/v1.5/concepts/storage_engine/:

TSM files contain sorted, compressed series data.

TSM Files - TSM files store compressed series data in a columnar format. FileStore - The FileStore mediates access to all TSM files on disk. It ensures that TSM files are installed atomically when existing ones are replaced as well as removing TSM files that are no longer used.

In-Memory Index - The in-memory index is a shared index across shards that provides the quick access to measurements, tags, and series. The index is used by the engine, but is not specific to the storage engine itself.

From http://docs.influxdata.com/influxdb/v1.5/concepts/time-series-index/:

Up until TSI, the inverted index was an in-memory data structure that was built during startup of the database based on the data in TSM.

In order to check that TSI is enabled, go to one of the shard's directory, there should be a folder named index. Inside this folder, there should be multiple folders containing *.tsl files.

I put these explanations here because at first, it was also very confusing for me. Please correct me if I'm wrong of course.

NocturnalShadow commented 6 years ago

@phmaes, at first, I thought that it works exactly as you've described (and I suspect, it was supposed to work this way). However, strange things happening in real environment with TSI enabled: If I convert to TSI all the indexes with influx_inspect, it creates index folders for each shard. But when I start InfluxDB, in 1 minute it fills out all RAM (>15GB) and then container fails, because I've specified the hard limit. This is strange, since I've expected TSI to do exactly the opposite: to solve the immense RAM usage problem. In the same time, disk I/O is minimal. Here what I get from docker stats: image

After that, I recursively delete index folders in data directory and restart InfluxDB still with TSI enabled the picture changes dramatically (at first): image

However, after one day passed (InfluxDB was up and running and was accepting more metrics) here is what is see: image

I see new shards are created in data directory, new index folders appear for those shards and the memory usage is rising again, slowly but confidently: image

e-dard commented 6 years ago

@NocturnalShadow hi, looks like there are a couple of things here:

1) yes you're using the tsi1 index, and tsm1 is not an index, but is the storage engine as @phmaes points out.

2) TSI has much better memory usage than the inmem index, and it's very likely that the increase in memory utilisation you're seeing is due to the way that the index works. The TSI index is a disk-backed index, but of course if we read directly from disk it would be very slow. Instead we mmap into memory the parts of the index that we need as and when (the kernel actually takes care of this). If you want to know the actual memory usage of the database take a look at HeapInUse, which can be accessed in the first couple of lines of output of SHOW STATS, or by doing something like SELECT mean(HeapInUse) FROM "_internal"."monitor"."runtime" WHERE time > now()-1h GROUP BY time(1m) ORDER BY time DESC

I'll keep this issue open a little while in case you find that those HeapInUse stats are very high and something else might be going on.

NocturnalShadow commented 6 years ago

I've figured out what was going on. I had the same problem as people mentioned in the issues referenced above: TSI index RAM usage is just immense. I though that I was using in-mem indexing and not TSI because of the their official description: in-mem was supposed to consume a lot of RAM in case of high series carnality, while TSI was supposed to solve this issue consuming much less. The reality turned out to be different: in-mem indexing indeed consumes a lot of RAM (~10GB in my case). However, TSI indexing not only doesn't solve the problem, but it makes things way worse (when InfluxDB starts it loads all 15GB memory and then fails the container, since there is no more RAM left).

The behavior I've expected for TSI vs In-Mem was exactly the opposite, that's where the confusion came from.

e-dard commented 6 years ago

@NocturnalShadow

However, TSI indexing not only doesn't solve the problem, but it makes things way worse (when InfluxDB starts it loads all 15GB memory and then fails the container, since there is no more RAM left).

Are you talking about RSS here or the actual heap usage, as reported via InfluxDB's internal memory stats? I explained how to get those stats in the previous message.

e-dard commented 5 years ago

Closing due to inactivity.

DavidAntliff commented 3 years ago

@NocturnalShadow, you mentioned in your second post (your emphasis):

all index directories recursively deleted under influxdb/data after successfully running influx_inspect buildtsi

Just curious - why are you deleting the index directories after building them? What influx_inspect buildsi does is build TSI indices for all (or a subset of) existing data, otherwise only new records are stored with TSI. If you delete all the index directories after just creating them, then influx_inspect buildtsi is effectively doing nothing.

The upgrade docs mention that you can delete any existing index directories before running influx_inspect buildtsi - I assume that's to make sure the tool starts from a clean slate.