hansmi / prometheus-lvm-exporter

Prometheus exporter for LVM metrics.
BSD 3-Clause "New" or "Revised" License
23 stars 5 forks source link

Bad use of labels #31

Open raspbeguy opened 1 year ago

raspbeguy commented 1 year ago

Hello,

I see that you defined special metrics lvm_lv_info, lvm_vg_info and lvm_pv_info which useful data is contained in the labels, and which value is always 1. The other metrics label is only the UUID.

This is a bad metric design, as labels should be used to filter metrics, not to carry new information. Also some labels like lv_attr are by nature very volatile (it will change when activating/deactivating the volume for example) which leads to harmful multiplication of timeseries, as this extract from prometheus doc reminds:

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Also, UUID as only label on the other metrics makes data querying very complex.

I suggest dropping the *_info metrics and adding a setting allowing the user to include the needed labels on the other metrics.

hansmi commented 1 year ago

Thank you for your feedback and observation. You're right, stuffing all non-numeric fields into *_info was probably a bad choice in hindsight. The code has been out there for almost two years and such fundamental changes need to be gradual. Start by emitting the new metrics and add a flag to disable the old ones and after some amount of time delete the old labels.

We can add *_info-esque metrics for the non-numeric values:

lvm_lv_info{lv_name="lv_db_backup",lv_uuid="r7Btja-…"} 1
lvm_lv_attr{lv_uuid="r7Btja-…",value="-wi-ao----"} 1

I'd keep a few basic things like the name in _info. What do you think?

Also, UUID as only label on the other metrics makes data querying very complex.

The UUID is the only stable identifier during the lifetime of an object in LVM. Examples from my own rules:

lvm_pv_info{pv_attr=~"..[^-].*"} or on (instance, pv_uuid)
lvm_pv_missing > 0
lvm_vg_info{vg_attr=~"...[^-].*"} or on (instance, vg_uuid)
lvm_vg_missing_pv_count != 0 or on (instance, vg_uuid)
lvm_vg_partial != 0
raspbeguy commented 1 year ago

I understand the non-breaking decision.

My point was not to split metrics into even more label-only metrics. Ideally I wish they were gone, but I understand that for the sake of non breaking things they won't. Having metrics which are always equal to 1 and whose labels play the role of metrics dosn't make any sense to me, as this is poisoning how prometheus manages and stores its metrics. You have even lv_active which has a numeric value (0 or 1) as a label while it should clearly be the value of a dedicated metric, say lvm_lv_active.

Every labels that has a finite number of values (for example lv_permissions) should have its own dedicated metric, and the values matched to integers.

IMHO the only labels to be kept as is are the properties that defines the entity (for example lv_dm_path, lv_uuid, lv_name which are by nature stable and normally won't change (I know you can technically change a lv name, but that's not something that is volatile by nature). Those labels should also be added to all metrics, not only the *_info ones

Optionally, have a setting to let the user add any label needed.