[BUG] carbon-aggregator input regexp match is greedy

hessu commented 4 years ago

Describe the bug When using aggregator to aggregate metrics from nodes with a name such as "mycluster-location1" to "mycluster-location350" using a regexp with "mycluster-\<location>\d+" the rule regexp breaks it out to summaries for "location", "location1", "location2" ... up to "location35" instead of "location".

To Reproduce Configure aggregation-rule of:

aggregate.\<location>.app.requests (60) = sum collectd.mycluster-\<location>\d+.app.requests

Feed data with collectd.mycluster-helsinki150.app.requests, observe aggregated data point come out for aggregate.helsinki15.app.requests.

Expected behavior Should not include the numbers matched by \d+ in the aggregate identiifer.

The issue appears to be in lib/carbon/aggregator/rules.py, regex_part setup, it should use the '?' non-greedy modifier suffix to make the match not greedy and let the user-supplied part on the right side be greedy. I have a tested patch, I will make a pull request shortly.

piotr1212 commented 4 years ago

I'm wondering if there are any use cases where this needs to be greedy and if this will break those. Can't think of any atm.

deniszh commented 4 years ago

Me too. I agreed with @hessu - at least fixed behavior is more logical.

hessu commented 4 years ago

If the right-side length is fixed (\d{3} for example, or a constant string), the patch will not change behaviour. It'll only have an effect when the right-side is variable-length, and there is some ambiguity involved, at which point it'll let the right side match more characters.

The user has control in config over the right side/postfix, but no control over the regexp matching the aggregate identifier.

hessu commented 4 years ago

Thanks for the quick merge! Probably my smallest open source contribution ever, just 2 bytes added. Also one of the quicker merges.

graphite-project / carbon

[BUG] carbon-aggregator input regexp match is greedy #871