prometheus / client_python

Prometheus instrumentation library for Python applications
Apache License 2.0
3.96k stars 798 forks source link

Eagerly convert value to float in *MetricFamily.add_metric() #527

Open mhansen opened 4 years ago

mhansen commented 4 years ago

Hi there, an experience report and improvement suggestion.

I have a pretty simple exporter which queries a backend URL for JSON, grabs the JSON, stuffs it into prometheus metric families, which it then yields.

Sometimes it errors with stacktraces only when the request is finishing, with a stack traces that points at finish_request rather than the code that added the bad float:

TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/socketserver.py", line 625, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/local/lib/python3.5/http/server.py", line 422, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.5/http/server.py", line 410, in handle_one_request
    method()
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/exposition.py", line 152, in do_GET
    output = encoder(registry)
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/openmetrics/exposition.py", line 56, in generate_latest
    floatToGoString(s.value),
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/utils.py", line 8, in floatToGoString
    d = float(d)
TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))

I've hit this kind of bug a few times in different exporters (I guess it's to be expected to get type errors in Python sometimes).

How about eagerly converting the value passed to add_metric to a float? Then the stack trace would point at the exact cause.

This might be a breaking change - that'd be a reasonable reason to reject this. But any code doing this will likely fail soon after, as soon as an attempt is made to serialize the metric, so maybe it'd be worth the change for better debuggability? What do you think?

brian-brazil commented 4 years ago

That sounds reasonable to me. I don't see how it'd be breaking, you're only changing when we're erroring on bad data.

mhansen commented 4 years ago

It’d only break if someone was creating a metric but never turning it into a string. Probably unlikely, but maybe someone is holding them in memory and not outputting them? Seems a bit unlikely.

On Thu, 12 Mar 2020 at 22:46, Brian Brazil notifications@github.com wrote:

That sounds reasonable to me. I don't see how it'd be breaking, you're only changing when we're erroring on bad data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598143269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOLVOHAXZYMRVE6BX2DRHDDPRANCNFSM4LGJ4OGA .

brian-brazil commented 4 years ago

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

mhansen commented 4 years ago

Having a quick look into this, looks like a common pinchpoint we could add this check would be the Sample constructor.

But it's a namedtuple. In python3 there's typing.NamedTuple, but that doesn't verify types at runtime.

Looks like it's not easy to override the constructor of a namedtuple, except by subclassing or having a factory function wrapper. Subclassing looks like a better win, preserving the Sample function as a type (which some folks might depend on), as written here: https://stackoverflow.com/a/16721002/171898

On Thu, 12 Mar 2020 at 22:52, Brian Brazil notifications@github.com wrote:

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598145595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOO7EABQZQ4NWYNKKDLRHDEH7ANCNFSM4LGJ4OGA .

mhansen commented 4 years ago

Could always write out the entire generated code body of the namedtuple and then change the constructor too! Not ideal for sure.

On Thu, 19 Mar 2020 at 20:42, Mark Hansen mark@markhansen.co.nz wrote:

Having a quick look into this, looks like a common pinchpoint we could add this check would be the Sample constructor.

But it's a namedtuple. In python3 there's typing.NamedTuple, but that doesn't verify types at runtime.

Looks like it's not easy to override the constructor of a namedtuple, except by subclassing or having a factory function wrapper. Subclassing looks like a better win, preserving the Sample function as a type (which some folks might depend on), as written here: https://stackoverflow.com/a/16721002/171898

On Thu, 12 Mar 2020 at 22:52, Brian Brazil notifications@github.com wrote:

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598145595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOO7EABQZQ4NWYNKKDLRHDEH7ANCNFSM4LGJ4OGA .

brian-brazil commented 4 years ago

The quickest win is probably the add_metric function itself, but wrapping the constructor should also work.

mhansen commented 4 years ago

Yeah add_metric would be quick... but looks like there are 8 implementations of add_metric which makes me feel a bit awkward. Good fallback option, trying to wrap the constructor now.

On Thu, 19 Mar 2020 at 20:58, Brian Brazil notifications@github.com wrote:

The quickest win is probably the add_metric function itself, but wrapping the constructor should also work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-601090072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOJAXY25KKYQYSAHCGLRIHUD3ANCNFSM4LGJ4OGA .