influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.91k stars 3.55k forks source link

Getting unexpected values from diskBytes in the shard stats of the _internal database #24478

Open tjb36 opened 11 months ago

tjb36 commented 11 months ago

Steps to reproduce:

  1. SELECT "diskBytes" FROM "shard" WHERE "database"='_internal'
  2. Note, I have started with a brand new database, and as such there is only a single /wal/ and /data/ directory path (so I know the problem is not due to needing to sum across paths):
    show tag values from "shard" with key = "path" where "database"='_internal'
    name: shard
    key  value
    ---  -----
    path /var/lib/influxdb/data/_internal/monitor/365
show tag values from "shard" with key = "walPath" where "database"='_internal'
name: shard
key     value
---     -----
walPath /var/lib/influxdb/wal/_internal/monitor/365

Expected behaviour: I expected the value returned by this metric to be a sum of the /data/_internal and /wal/_internal directories for this database. If this is not the case (due to some memory cacheing in the background?) then I would expect this value to be at least greater than the sum of the /data/_internal and /wal/_internal directories. The reason I say this is because the definition of diskBytes in the docs says "including the size of the data directory and the WAL directory."

Actual behaviour: When examining the diskBytes field from shard (blue curve below), I find that it correctly tracks the size of the /wal/ + /data/ directories (orange curve, measused using ducommand) for this _internal database in the beginning, but then it drops. I found that these drops correspond to a new .wal file being created (and when this happens, the previous active .wal file reduces in size).

image

Regardless, this diskBytes does not seem to be a reliable indicator of the disk space used. Am I misunderstanding what diskBytes and shard represent, or are the docs misleading?

Environment info:

davidby-influx commented 11 months ago

I would guess that diskBytes probably tracks the bytes the database actually considers to be live. InfluxDB purges files (after compaction, after writing the cache to disk, etc.), and these purges are asynchronous. I haven't looked into the code to calculate diskBytes, but you can imagine discarded files building up on disk until they are cleaned up, and why we might want to do that. It is far more important to capture incoming data than it is to remove expired/unused data, so writes take precedence over deletes.

tjb36 commented 11 months ago

Hi @davidby-influx . Thanks for the reply. Yes I can imagine some purging going on (leading to the drops in shard size).

Do you agree that the docs are misleading then? I was expecting to have something which indicated current size on disk.

I am looking for a metric exposed by the _internal InfluxDB database which tells me the current disk usage of both /wal/ and /data/, instead of me having to manually run du on the directories. It seems like such a metric does not exist then?

davidby-influx commented 11 months ago

I couldn't say whether the docs are correct without reading the code to fully understand it. My statement above was not definitive, because, as I wrote, I haven't read the code.