influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.56k stars 3.53k forks source link

Cannot drop data after changing index to TSI #15004

Open cerberek opened 5 years ago

cerberek commented 5 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

We made upgrade from 1.4 to 1.7 and we saw not nice change in grafana graphs. So I wanted to move from inmem to tsi. I run buildtsi, put tsi in configuration and restarted influxdb. This was done few months ago. Now i realized that I cannot drop files. So I removed oll index subfolders in shards and run buildtsi once again. I did upgrade to latest influxdb and restarted again. Issue persist

When I do drop in influx cli I can see: ERR: cannot delete data. DB contains shards using both inmem and tsi1 indexes. Please convert all shards to use the same index type to delete data.

Expected behavior: Describe what you expected to happen.

Actual behavior: Describe What actually happened.

Environment info:

Config: Copy any non-default config values here or attach the full config as a gist or file. index-version = "tsi1"

Logs: Include snippet of errors in log.

drop measurement nftables_in_443_value ERR: cannot delete data. DB contains shards using both inmem and tsi1 indexes. Please convert all shards to use the same index type to delete data.

Performance: Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

# Commands should be run when the bug is actively.
# Note: This command will run for at least 30 seconds.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"
curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz`, `vars.txt`, and `iostat.txt` output files.

profiles.tar.gz vars.txt iostat.txt

cerberek commented 4 years ago

Today I checked logs and I see there that db is creating TSM snapshots. Does it mean that even I have TSI in configuration Influx still use TSM?

nfluxd[7505]: ts=2019-09-26T13:02:04.835544Z lvl=info msg="Snapshot for path written" log_id=0Hid5NKl000 engine=tsm1 trace_id=0I7QTjyW000 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/data/collectd/autogen/8746 duration=1325.500ms influxd[7505]: ts=2019-09-26T13:02:04.835587Z lvl=info msg="Cache snapshot (end)" log_id=0Hid5NKl000 engine=tsm1 trace_id=0I7QTjyW000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=1325.540ms

maanasa commented 4 years ago

+1

How should be drop these measurements? How do we convert shards to same index? Influx version 1.7.8

lobocobra commented 4 years ago

+1 influx version 1.7.9

apollotonkosmo commented 4 years ago

+1 influxdb 1.8

gshif commented 4 years ago

Test Environment:

Use Case 1

Use Case 2

shard:/var/lib/influxdb/data/_internal/monitor/1:1": {"name":"shard","tags":{"database":"_internal","engine":"tsm1","id":"1","indexType":"tsi1","path":"/var/lib/influxdb/data/_internal/monitor/1","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/1"},"values":{"diskBytes":145869,"fieldsCreate":0,"seriesCreate":17,"writeBytes":0,"writePointsDropped":0,"writePointsErr":0,"writePointsOk":0,"writeReq":0,"writeReqErr":0,"writeReqOk":0}},
"tsm1_engine:/var/lib/influxdb/data/_internal/monitor/1:1": {"name":"tsm1_engine","tags":{"database":"_internal","engine":"tsm1","id":"1","indexType":"tsi1","path":"/var/lib/influxdb/data/_internal/monitor/1","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/1"},"values":{"cacheCompactionDuration":0,"cacheCompactionErr":0,"cacheCompactions":0,"cacheCompactionsActive":0,"tsmFullCompactionDuration":0,"tsmFullCompactionErr":0,"tsmFullCompactionQueue":0,"tsmFullCompactions":0,"tsmFullCompactionsActive":0,"tsmLevel1CompactionDuration":0,"tsmLevel1CompactionErr":0,"tsmLevel1CompactionQueue":0,"tsmLevel1Compactions":0,"tsmLevel1CompactionsActive":0,"tsmLevel2CompactionDuration":0,"tsmLevel2CompactionErr":0,"tsmLevel2CompactionQueue":0,"tsmLevel2Compactions":0,"tsmLevel2CompactionsActive":0,"tsmLevel3CompactionDuration":0,"tsmLevel3CompactionErr":0,"tsmLevel3CompactionQueue":0,"tsmLevel3Compactions":0,"tsmLevel3CompactionsActive":0,"tsmOptimizeCompactionDuration":0,"tsmOptimizeCompactionErr":0,"tsmOptimizeCompactionQueue":0,"tsmOptimizeCompactions":0,"tsmOptimizeCompactionsActive":0}},
"tsm1_cache:/var/lib/influxdb/data/_internal/monitor/1:1": {"name":"tsm1_cache","tags":{"database":"_internal","engine":"tsm1","id":"1","indexType":"tsi1","path":"/var/lib/influxdb/data/_internal/monitor/1","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/1"},"values":{"WALCompactionTimeMs":0,"cacheAgeMs":0,"cachedBytes":0,"diskBytes":0,"memBytes":0,"snapshotCount":0,"writeDropped":0,"writeErr":0,"writeOk":0}},
"tsm1_filestore:/var/lib/influxdb/data/_internal/monitor/1:1": {"name":"tsm1_filestore","tags":{"database":"_internal","engine":"tsm1","id":"1","indexType":"tsi1","path":"/var/lib/influxdb/data/_internal/monitor/1","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/1"},"values":{"diskBytes":145869,"numFiles":1}},
"tsm1_wal:/var/lib/influxdb/data/_internal/monitor/1:1": {"name":"tsm1_wal","tags":{"database":"_internal","engine":"tsm1","id":"1","indexType":"tsi1","path":"/var/lib/influxdb/data/_internal/monitor/1","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/1"},"values":{"currentSegmentDiskBytes":0,"oldSegmentsDiskBytes":0,"writeErr":0,"writeOk":0}},
"shard:/var/lib/influxdb/data/telegraf/autogen/2:2": {"name":"shard","tags":{"database":"telegraf","engine":"tsm1","id":"2","indexType":"inmem","path":"/var/lib/influxdb/data/telegraf/autogen/2","retentionPolicy":"autogen","walPath":"/var/lib/influxdb/wal/telegraf/autogen/2"},"values":{"diskBytes":1212526,"fieldsCreate":0,"seriesCreate":14,"writeBytes":0,"writePointsDropped":0,"writePointsErr":0,"writePointsOk":0,"writeReq":0,"writeReqErr":0,"writeReqOk":0}},
"tsm1_engine:/var/lib/influxdb/data/telegraf/autogen/2:2": {"name":"tsm1_engine","tags":{"database":"telegraf","engine":"tsm1","id":"2","indexType":"inmem","path":"/var/lib/influxdb/data/telegraf/autogen/2","retentionPolicy":"autogen","walPath":"/var/lib/influxdb/wal/telegraf/autogen/2"},"values":{"cacheCompactionDuration":0,"cacheCompactionErr":0,"cacheCompactions":0,"cacheCompactionsActive":0,"tsmFullCompactionDuration":0,"tsmFullCompactionErr":0,"tsmFullCompactionQueue":0,"tsmFullCompactions":0,"tsmFullCompactionsActive":0,"tsmLevel1CompactionDuration":0,"tsmLevel1CompactionErr":0,"tsmLevel1CompactionQueue":0,"tsmLevel1Compactions":0,"tsmLevel1CompactionsActive":0,"tsmLevel2CompactionDuration":0,"tsmLevel2CompactionErr":0,"tsmLevel2CompactionQueue":0,"tsmLevel2Compactions":0,"tsmLevel2CompactionsActive":0,"tsmLevel3CompactionDuration":0,"tsmLevel3CompactionErr":0,"tsmLevel3CompactionQueue":0,"tsmLevel3Compactions":0,"tsmLevel3CompactionsActive":0,"tsmOptimizeCompactionDuration":0,"tsmOptimizeCompactionErr":0,"tsmOptimizeCompactionQueue":0,"tsmOptimizeCompactions":0,"tsmOptimizeCompactionsActive":0}},
"tsm1_cache:/var/lib/influxdb/data/telegraf/autogen/2:2": {"name":"tsm1_cache","tags":{"database":"telegraf","engine":"tsm1","id":"2","indexType":"inmem","path":"/var/lib/influxdb/data/telegraf/autogen/2","retentionPolicy":"autogen","walPath":"/var/lib/influxdb/wal/telegraf/autogen/2"},"values":{"WALCompactionTimeMs":0,"cacheAgeMs":10018,"cachedBytes":0,"diskBytes":0,"memBytes":9930,"snapshotCount":0,"writeDropped":0,"writeErr":0,"writeOk":1}},
"tsm1_filestore:/var/lib/influxdb/data/telegraf/autogen/2:2": {"name":"tsm1_filestore","tags":{"database":"telegraf","engine":"tsm1","id":"2","indexType":"inmem","path":"/var/lib/influxdb/data/telegraf/autogen/2","retentionPolicy":"autogen","walPath":"/var/lib/influxdb/wal/telegraf/autogen/2"},"values":{"diskBytes":1207722,"numFiles":10}},
"tsm1_wal:/var/lib/influxdb/data/telegraf/autogen/2:2": {"name":"tsm1_wal","tags":{"database":"telegraf","engine":"tsm1","id":"2","indexType":"inmem","path":"/var/lib/influxdb/data/telegraf/autogen/2","retentionPolicy":"autogen","walPath":"/var/lib/influxdb/wal/telegraf/autogen/2"},"values":{"currentSegmentDiskBytes":2402,"oldSegmentsDiskBytes":2402,"writeErr":0,"writeOk":0}},
"shard:/var/lib/influxdb/data/_internal/monitor/3:3": {"name":"shard","tags":{"database":"_internal","engine":"tsm1","id":"3","indexType":"inmem","path":"/var/lib/influxdb/data/_internal/monitor/3","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/3"},"values":{"diskBytes":1292621,"fieldsCreate":0,"seriesCreate":38,"writeBytes":0,"writePointsDropped":0,"writePointsErr":0,"writePointsOk":0,"writeReq":0,"writeReqErr":0,"writeReqOk":0}},
"tsm1_engine:/var/lib/influxdb/data/_internal/monitor/3:3": {"name":"tsm1_engine","tags":{"database":"_internal","engine":"tsm1","id":"3","indexType":"inmem","path":"/var/lib/influxdb/data/_internal/monitor/3","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/3"},"values":{"cacheCompactionDuration":0,"cacheCompactionErr":0,"cacheCompactions":0,"cacheCompactionsActive":0,"tsmFullCompactionDuration":0,"tsmFullCompactionErr":0,"tsmFullCompactionQueue":0,"tsmFullCompactions":0,"tsmFullCompactionsActive":0,"tsmLevel1CompactionDuration":0,"tsmLevel1CompactionErr":0,"tsmLevel1CompactionQueue":0,"tsmLevel1Compactions":0,"tsmLevel1CompactionsActive":0,"tsmLevel2CompactionDuration":0,"tsmLevel2CompactionErr":0,"tsmLevel2CompactionQueue":0,"tsmLevel2Compactions":0,"tsmLevel2CompactionsActive":0,"tsmLevel3CompactionDuration":0,"tsmLevel3CompactionErr":0,"tsmLevel3CompactionQueue":0,"tsmLevel3Compactions":0,"tsmLevel3CompactionsActive":0,"tsmOptimizeCompactionDuration":0,"tsmOptimizeCompactionErr":0,"tsmOptimizeCompactionQueue":0,"tsmOptimizeCompactions":0,"tsmOptimizeCompactionsActive":0}},
"tsm1_cache:/var/lib/influxdb/data/_internal/monitor/3:3": {"name":"tsm1_cache","tags":{"database":"_internal","engine":"tsm1","id":"3","indexType":"inmem","path":"/var/lib/influxdb/data/_internal/monitor/3","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/3"},"values":{"WALCompactionTimeMs":0,"cacheAgeMs":10032,"cachedBytes":0,"diskBytes":0,"memBytes":0,"snapshotCount":0,"writeDropped":0,"writeErr":0,"writeOk":0}},
"tsm1_filestore:/var/lib/influxdb/data/_internal/monitor/3:3": {"name":"tsm1_filestore","tags":{"database":"_internal","engine":"tsm1","id":"3","indexType":"inmem","path":"/var/lib/influxdb/data/_internal/monitor/3","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/3"},"values":{"diskBytes":1292621,"numFiles":8}},
"tsm1_wal:/var/lib/influxdb/data/_internal/monitor/3:3": {"name":"tsm1_wal","tags":{"database":"_internal","engine":"tsm1","id":"3","indexType":"inmem","path":"/var/lib/influxdb/data/_internal/monitor/3","retentionPolicy":"monitor","walPath":"/var/lib/influxdb/wal/_internal/monitor/3"},"values":{"currentSegmentDiskBytes":0,"oldSegmentsDiskBytes":0,"writeErr":0,"writeOk":0}},

Since I do not know for sure what command was used by a user to build tsi indexes I can speculate based on the debug vars output and my use case, that in cases when we only rebuild a specific database with a particular shard dropping the measurement will return an error.

gshif commented 4 years ago

INMEM TSI

780Farva commented 3 years ago

Can anyone confirm if the series compaction tool in 1.8 helps in this situation? I saw it recommended by a staff member here back in April, but there hasn't been a follow up yet.

dgnorton commented 3 years ago

@cerberek @780Farva mixed indexes (having some shards using inmem and others using tsi1) are not supported. Is the issue that buildtsi is failing to build an index for all shards? Or, it was only run on one database?

780Farva commented 3 years ago

Thanks for following up @dgnorton!

Update on my situation: a small finger-flub when setting up the environment variable to tell influx to use tsi went undetected, so after the buildtsi step the database was still using tsm, though the existing shards were all converted. When we looked at the logs we saw index_version reported as tsi1 so we didn't suspect that the instance was mal-configured.

INLFUXDB_DATA_INDEX_VERSION=tsi1 <- This was the offending env var. We had an l instead of an 1 due to raging dyslexia. Correcting that mis-spelling and re-doing the migration steps has had our system working healthily for the last 10 days.

Update: It is now late February 2021 and things are still fine.

nanicpc commented 3 years ago

@780Farva I'm also having this issue, but could not find the root cause. could you specify what was your misspelling? thanks

780Farva commented 3 years ago

@780Farva I'm also having this issue, but could not find the root cause. could you specify what was your misspelling? thanks

Updated the above comment! Thanks for pointing out that our misspell was missing.

wizzfizz94 commented 3 years ago

Is there anyway to solve this by converting all tsi1 shards to tsm1 shards? It seems buildtsi only goes from tsm1 to tsi1 but I'd like to return to an fully inmeme index.