thumbor-community / prometheus

MIT License
8 stars 8 forks source link

Thumbor 7 uses identical names for counters and timeseries and it causes exception in Prometheus registry #9

Closed mbarouski closed 2 years ago

mbarouski commented 2 years ago

I opened PR in Thumbor project https://github.com/thumbor/thumbor/pull/1462 to fix this. If it's not merged, then this project should add different postfixes for counters and time series metrics. Hope it will be merged, then we will have to put restriction on Thumbor version like >=7.0.11 (current 7.0.10) or something similar. @savar maybe you have other thoughts to fix this?

savar commented 2 years ago

I think your PR within thumbor shows the issue there. I would not think about changing it here except if this is (for whatever reason) not fixed within thumbor.

If you have time pressure I would probably monkeypatch tc-prometheus and simply rename the metricname in def incr and def timing when you see the keys in question and then call the original def incr and def timing

mbarouski commented 2 years ago

Yes, I agree this should be fixed on Thumbor side. About monkeypatch... thanks and no worries, I will wait for some kind of proper fix on Thumbor side :)

Added: I created the issue in the repo to keep context if somebody faces the same issue.

savar commented 2 years ago

As I can see, you face more than one of these issues and checking the statsd->prometheus exporter this is a known issue: https://github.com/prometheus/statsd_exporter#explicit-metric-type-mapping

So I guess it might make sense for the time being (even though it is not nice) to create a mapping. Additionally it might make sense to support extra config similar to the exporter but that would make it much more complicated and error prone.

image as we can see we have exactly the 3 broken ones here.. none_smart, smart and image.fetch which already is decoded to use labels. So we could treat this one even more special and add to them all in incr() a postfix _total. That would be my only Idea right now. WDYT?

mbarouski commented 2 years ago

@savar I initially thought to add postfixes to both, but timing metrics are not always about latency, so adding of postfix to incr metrics only is nice by me 👍🏻

Should I do this?

mbarouski commented 2 years ago

BTW to do with the postfix changes... original_image.status metric has one more label networklocation (it's optional) that is not in specified in the map in master branch. Having optional label, I think we should make this change as well to prevent index out of range exception:

labels = {}
for index, label in enumerate(self.mapping[name]):
    labels[label] = values[index] if index < len(values) else None
mbarouski commented 2 years ago

@savar I made changes suggested by you here. Please have a look 👀

savar commented 2 years ago

@mbarouski can we close this issue for now or do you want to keep it open?

mbarouski commented 2 years ago

@savar sure, we can close it