We currently have (on Sozu 1.0.5) two "kind" of metrics on http status codes:
http.301.redirection, http.400.errors, http.401.errors and alike
http.status.1xx, http.status.2xx and alike
The first category is only incremented for automatic responses generated by Sozu. The second is incremented for every response, generated or forwarded from a backend. This explains why the former uses explicit error numbers and the latter, aggregated ones.
The naming is not really explicit, so may we change it? Maybe http.generated.301 and http.total.1xx?
Additionally, independently of their origin, some of those metrics are stored:
per backend, like http.502.errors and http.status.2xx
per cluster, like http.503.errors and http.status.5xx
per worker, like http.400.errors and http.status.3xx
We can't store all of them with "backend precision", as some errors are generated prior to this knowledge (400 are typically generated before the cluster is known). Should we aggregate the "more precise" status codes on clusters and workers? To have quick access to how many 2xx responses were served for example (currently we have to look into each backend). Should it be done with more metrics directly? or with an option of the metrics get command? If we choose the former should they be named differently? If the latter, should we specify the exact name of the metric to aggregate, a pattern, or does it aggregate all metrics that can be aggregated?
We currently have (on Sozu 1.0.5) two "kind" of metrics on http status codes:
http.301.redirection
,http.400.errors
,http.401.errors
and alikehttp.status.1xx
,http.status.2xx
and alike The first category is only incremented for automatic responses generated by Sozu. The second is incremented for every response, generated or forwarded from a backend. This explains why the former uses explicit error numbers and the latter, aggregated ones.The naming is not really explicit, so may we change it? Maybe
http.generated.301
andhttp.total.1xx
? Additionally, independently of their origin, some of those metrics are stored:http.502.errors
andhttp.status.2xx
http.503.errors
andhttp.status.5xx
http.400.errors
andhttp.status.3xx
We can't store all of them with "backend precision", as some errors are generated prior to this knowledge (400 are typically generated before the cluster is known). Should we aggregate the "more precise" status codes on clusters and workers? To have quick access to how many 2xx responses were served for example (currently we have to look into each backend). Should it be done with more metrics directly? or with an option of the
metrics get
command? If we choose the former should they be named differently? If the latter, should we specify the exact name of the metric to aggregate, a pattern, or does it aggregate all metrics that can be aggregated?