influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.65k stars 3.54k forks source link

"SHOW TAG VALUES" returns deleted values #14904

Open tktr opened 5 years ago

tktr commented 5 years ago

It seems that the "SHOW TAG VALUES" command returns values which have been already deleted.

Steps to reproduce:

  1. Run latest Docker image: docker run --rm -p 8086:8086 -it influxdb:1.7.7
  2. Insert and delete test data:
    export INFLUX_DB='test'
    export INFLUX_MMT='testdata'
    influx -execute "CREATE database $INFLUX_DB"
    influx -database $INFLUX_DB -execute "INSERT $INFLUX_MMT,foo=bar baz=1"
    influx -database $INFLUX_DB -execute "INSERT $INFLUX_MMT,foo=bar2 baz=2"
    influx -database $INFLUX_DB -execute "DELETE FROM $INFLUX_MMT WHERE foo=bar2"
  3. List tag values: influx -database $INFLUX_DB -execute "SHOW TAG VALUES FROM $INFLUX_MMT WITH KEY=foo"

Expected output:

name: testdata
--------------
key     value
foo     bar

Actual output:

name: testdata
--------------
key     value
foo     bar
foo     bar2

Environment info:

tktr commented 5 years ago

Is there any workaround possible which does not require rewriting the measurement?

kkdev163 commented 4 years ago

"SHOW TAG VALUES" not only returns the manually deleted values but also the values auto-deleted by Retention Policy.

And It seems the protection mechanism which controlled by the config of 'max-values-per-tag', has the same problem, it will count values that are deleted.

For example, my 'max-values-per-tag' is 20000. My default RP is 7d. Recently I get some error log like this: image

When I use

SHOW TAG VALUES from "trace_image" with key = "imageUrl"

it will return 20001 rows;

But if I use

SHOW TAG VALUES from "trace_image" with key = "imageUrl" where AppKey =~/./

it will only return 271 rows;

Then I use

SHOW TAG VALUES from "trace_image" with key = "imageUrl" where AppKey !~/./

it return 0 row;

So, it seems 'SHOW TAG VALUES' with 'where condition' will return correct values, otherwise, it will return deleted values.

Environment info:

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tktr commented 4 years ago

Keep open.

zhaoxiangchun commented 4 years ago

I have the same problem : show tag values from vm_cpu with key = brand where time >now() - 1h. I get deleted values,what should I do?

Daryes commented 4 years ago

Same for me since a few days, with influxdb 1.8.1 (installed the 2020/07/16), using "tsi1" Not the first time I had the problem, and the only way to fix it was to drop all the data. While using influx_inspect buildtsi helped to fix some deleted series that kept appearing, it has no effect on this situation.

For example, this query will return all existing hosts (even deleted one) from all series in the db, not just this one

SHOW TAG VALUES FROM "ntpq" WITH KEY = "host" 

while a show series will only report 2 existing hosts for this one. This is not the only query, I'm having a lot of them with the same problem.

Something that might help is the fact SHOW SERIES FROM "measurement" will also return series from other measurements and different databases. It's more noticeable when using it on the "_internal" db and having mongodb series appearing in the result. But not all existing series, only a few of them at random, and will be inserted in the result before and after the correct series. I've never seen the invalid data inserted into the correct part.

andphe commented 3 years ago

there is a workaround https://github.com/influxdata/influxdb/issues/10285#issuecomment-444464920 tested in 1.8, restarting InfluxDB is also a workaround .. not cool tho