threefoldtech / 0-stor_v2

Apache License 2.0
3 stars 1 forks source link

Better visibility of stored files and their shard status #133

Open scottyeager opened 2 days ago

scottyeager commented 2 days ago

Currently there is no way to query the list of stored files and also no way to see the health of individual files in terms of how many shards they have stored in healthy backends. This makes it difficult to assess whether the system is in a degraded state. It's especially relevant when recovering from some backend failure to be able to check if all files have been rebuilt onto newly supplied backends. Being able to see a list of stored files is also helpful for general inspection of the system without needing to run lots of check commands and also keep a separate list of files that have been removed from local storage.

So I'm thinking of something like this:

  1. A list command that lists the stored files
  2. Some way of outputting the number of shards present for a given file in live backends (this could be part of list or check or both)
  3. At least one Prometheus metric that helps to understand whether the files are, overall, in a degraded state or not (do they have expected shards available, if not do they at least have minimal shards available)
iwanbk commented 14 hours ago

This makes it difficult to assess whether the system is in a degraded state.

While i agree that it is something that should be improved. I don't think that assessing by listing all stored files is a good idea for these reasons:

i think exposing repair/rebuild queue would be enough

scottyeager commented 14 hours ago

I can do without the list command, though I do think it would be handy for both human and machine consumption under different circumstances.

Exposing info on the repair queue would be fine. One thing I think is important though is that there's a way to get at the info both from CLI and via Prometheus.

iwanbk commented 14 hours ago

One thing I think is important though is that there's a way to get at the info both from CLI and via Prometheus.

yes, fully agree with this