[Question] Why is snapshot_hash used instead of snapshot ID?

Enrico204 commented 1 year ago

I was looking at the restic data in my Grafana deployment. After doing a manual backup yesterday, the client appears duplicated in the "Total backup size" item in the provided dashboard. Investigating further, I discovered that the "snapshot hash" is calculated here:

https://github.com/ngosang/restic-exporter/blob/880b47131c76948cffca7beba632c77bfd4a8d2c/restic-exporter.py#L201

My question is, why is the snapshot hash used instead of the snapshot ID (which already is a sort of hash)?

ngosang commented 1 year ago

For each snapshot restic provides several hashes:

"parent":"caf9137c9bccc0d2b266a0fc02be7a71652f3de4916034a234e535a3e9ef3a11",
"tree":"446e26b7a79b7b08f52d2e183cbbc9be948e0e56787a69aae3c2f9e63fbbd07c",
"id":"100c00902e666e25570773885e82200a982c4334268f1d608f138384de629915",
"short_id":"100c0090"

But none of them is useful to group all snapshots for the same user. In the first implementation I though that the "parent hash" was common across all snapshots of the user but it's not. I decided to make my own hash which means nothing but it's useful to group snapshots of the same user. I'm not publishing the restic hashes because they are not useful and they will generate too many time series in Prometheus.

Enrico204 commented 1 year ago

Ok, that makes sense. Thanks!

I suppose that I will investigate on why my client appears duplicated, and I will open a new issue or PR if there is any enhancement :-)

ferrarimarco commented 1 year ago

As explained in #16, in my case I get duplicates because of path changes (I added a couple of directories for a given host and user).

ngosang / restic-exporter

[Question] Why is snapshot_hash used instead of snapshot ID? #8