Open beorn7 opened 5 months ago
Hi @beorn7 , I have a couple of questions here,
If you think my understanding is right here, i can work on this and create a PR.
I'm not sure about the precise answers to your questions. I guess finding the answers is part of the task here. @juliusv and @SuperQ might be better suited to loop in here.
All I can say is that we want the best practices worded in a way that metric names that are in fact fine should be covered by the best practices. From my limited understanding, this is not at all about metrics for "failed" things or the _failed
suffix (ir infix) in particular. I think this is more about defining what a "unit" is in the Prometheus context. Once it is clarified that "truncations" is not seen as a unit in prometheus_tsdb_head_truncations_failed_total
, maybe the problem is solved already. The aspect of sorting related metrics together by moving their difference to a position in the name as late as possible might be something to mention in the best practices, but again, this is not specific to _failed
.
In different news, I think we still want to keep "real" units going last even for metrics that have failed
in their name. For example, if there is a metric called request_size_bytes_total
, and we want the size of failed requests in a separate counter, I could see different ways of calling it:
failed_request_size_bytes_total
because it reads most naturallyrequest_size_failed_bytes_total
to still have the "unit" last (bytes
is explicitly called out as a unit in the best practices).request_size_bytes_failed_total
for the best lexicographical sorting experience.request_failed_size_bytes_total
?!?My gut feeling right now is to not overregulate beyond "an actual unit (not "truncations") should go last" as 1st priority and "take sorting into account as it fits your use case" as 2nd priority. But as said, others will have stronger and better justified opinions.
@Gopi-eng2202 I'll assign you to this issue, and maybe you could just draft something up in a PR and nominate @SuperQ and @juliusv as reviewers to see what they think.
Ok , i got it. I'll work on it. Thanks
https://github.com/prometheus/prometheus/issues/8718 discusses "misnamed" metrics and comes to the conclusion that their names are actually fine and we should improve the recommendations for naming metrics to match the actually existing "fine" metric names.
So the task here is to turn the discussion in https://github.com/prometheus/prometheus/issues/8718 into changes of the metric naming best practices page.