nlopes / actix-web-prom

Actix-web middleware to expose Prometheus metrics
MIT License
89 stars 56 forks source link

feat: configurable cardinality for endpoint pattern label #72

Open lefuturiste opened 1 year ago

lefuturiste commented 1 year ago

Hello, first thanks for this wonderful crate. Like discussed in #20 and implemented in #30 we currently try to get the match_pattern of the request and use that as the label value for endpoint.

This is a feature request to be able to configure the cardinality for the endpoint pattern in the label of the metrics. I started to develop a POC and I will probably do a PR.

This is related to this downstream issue https://github.com/maplibre/martin/issues/773. For the use case of the martin tile server we need to have the details by sources_ids, to be able to tell which source is causing which latencies or which source is being the most requested. The cardinality here is limited since we have a fixed number of sources.

To be clear we need to have a way to transform this:

martin_http_requests_total{endpoint="/{source_ids}/{z}/{x}/{y}",method="GET",status="200"} 31

into:

martin_http_requests_total{endpoint="/parcels/{z}/{x}/{y}",method="GET",status="200"} 11
martin_http_requests_total{endpoint="/address/{z}/{x}/{y}",method="GET",status="200"} 20

What are your take on this?

nlopes commented 1 year ago

Humm I'm not 💯 on this one.

I totally get the need you have, that's clear. I'm just not convinced actix-web-prom should be dealing with the business needs you have.

You can (I think) change your code to have two/however many exceptions as separate route declarations, even if they call the same code (a macro can easily help you do this), which would mean you get this for free in actix-web-prom. You just need a tiny abstraction between your current #[route...] and the one you need.

lefuturiste commented 1 year ago

Humm I'm not 💯 on this one.

I totally get the need you have, that's clear. I'm just not convinced actix-web-prom should be dealing with the business needs you have.

You can (I think) change your code to have two/however many exceptions as separate route declarations, even if they call the same code (a macro can easily help you do this), which would mean you get this for free in actix-web-prom. You just need a tiny abstraction between your current #[route...] and the one you need.

If I understand correctly you saying that I should generate the route declaration to actix on the fly based on user configuration?

nlopes commented 1 year ago

Oooh I think I misunderstood. You're saying that your source_ids is something martin users/consumers will be able to set/pass. If this is the case what I mentioned won't work (of course) but my challenge is: are you sure you'd like to allow your users to potentially explode the cardinality? I'm not sure that's a good idea. And if you want to set a max cardinality number, then only the top N would appear? Would that be useful information though?

lefuturiste commented 1 year ago

Oooh I think I misunderstood. You're saying that your source_ids is something martin users/consumers will be able to set/pass. If this is the case what I mentioned won't work (of course) but my challenge is: are you sure you'd like to allow your users to potentially explode the cardinality? I'm not sure that's a good idea. And if you want to set a max cardinality number, then only the top N would appear? Would that be useful information though?

Disclaimer: I'm not a martin developer But in the case of martin, the users are usually sysadmin or developers. They usually configure in a YAML file N sources that corespond to N tables in a database (or a file).

For my personal use of martin, I have only 3 sources, so that's quite limited. AFAIK, I don't think many users have more than 10 sources. I don't think the "explode" qualifier is appropriate here.

But the issue is that the first argument is not "source_id" but "source_ids" to account for different combinations of sources. We could have a work around that to limit the cardinality.

To mitigate the issue, we will probably turn off the cardinality by default. So only the users that know what their doing can use that.

As for the usefulness, for example, for our team, it would be useful to know which source combination has a performance issue or is the most used.

nlopes commented 1 year ago

As for the usefulness, for example, for our team, it would be useful to know which source combination has a performance issue or is the most used.

I'll happily look at a PR (if you'd like to give it a shot) but I think if that's useful for your limited use-case, then perhaps it would be better to go about it in a different way. Perhaps creating explicit histograms/counters for source_ids you have set up in the config file instead of relying on the default ones we provide.

lefuturiste commented 1 year ago

Also one addition: Like I said, one issue is that the first argument is not "source_id" but "source_ids" to account for different combinations of sources. I have an idea about a workaround, it could be to have the downstream code implement a function hook to choose which "endpoint" string will be used, given the list of parameters.

So for example: /parcels,address/0/1/2 -> /address,parcels/{x}/{y}/{y} /address,parcels/0/1/2 -> /address,parcels/{x}/{y}/{y} /address,parcels,/0/1/2 -> /address,parcels/{x}/{y}/{y}

so we let the martin code handle what cardinality to use with a hook, the hook will sort the source_ids in alphabetical order and then remove the cardinality for x, y and z

lefuturiste commented 1 year ago

Okay I just found out that the match_info() method provided by actix that I wished to get the segments infos from to reconstruct the pattern is actually not accessible and private, so I think I will have to use some crate like the strfmt crate to reconstruct the right pattern by evaluating the {param} that we need to keep.

image

"[actix-web-prom/src/lib.rs:446] req.match_info()" = Path {
    path: Url {
        uri: "/address,parcels/13/4146/2816",
        path: None,
    },
    skip: 29,
    segments: [
        (
            "source_ids",
            Segment(
                1,
                16,
            ),
        ),
        (
            "z",
            Segment(
                17,
                19,
            ),
        ),
        (
            "x",
            Segment(
                20,
                24,
            ),
        ),
        (
            "y",
            Segment(
                25,
                29,
            ),
        ),
    ],
}
lefuturiste commented 1 year ago

I'll happily look at a PR (if you'd like to give it a shot) but I think if that's useful for your limited use-case, then perhaps it would be better to go about it in a different way. Perhaps creating explicit histograms/counters for source_ids you have set up in the config file instead of relying on the default ones we provide.

Yes I see, but the issue for me is when I'm creating a different metric family (with different metric name), I will have to handle a special case just for that in my Grafana dashboard for example, the fact that everything is under a same metric name would simplify the setup for everyone else.