knyar / nginx-lua-prometheus

Prometheus metric library for Nginx written in Lua
MIT License
1.46k stars 231 forks source link

Allow UTF-8 label values #109

Closed Gerrard-YNWA closed 3 years ago

Gerrard-YNWA commented 3 years ago

Hello, I found the label_values is not fully supported UTF-8.

Here is the minimal case:

# specify shared dict
lua_shared_dict prometheus_metrics 1m;

# init monitor module
prometheus = require("prometheus").init("prometheus_metrics")
metric_counter = prometheus:counter("metric_error", "error test", {"error_code", "error_msg"})

# incr counter 
metric_counter:inc(1, {"1001", "非法的用户Id"})

# collect
prometheus:collect()

got the following output and error_msg was stripped

# HELP metric_errorcode error code test
# TYPE metric_errorcode counter
metric_errorcode{error_code="1001",error_msg="Id"} 1
# HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE nginx_metric_errors_total counter
nginx_metric_errors_total 0

It seems that the full_metric_name https://github.com/knyar/nginx-lua-prometheus/blob/master/prometheus.lua#L94 takes all label values as one byte and strips all the noprintable ascii charachers.

knyar commented 3 years ago

Thanks for the report! The library is deliberately restricting label values to ASCII characters following #21.

I agree that it would be nice to find a way to extend the allowed range of values to all valid UTF-8. Perhaps adapting code from utf8_validator.lua as a sanitizer could work: if provided label values contain invalid utf-8 characters, we'd simply use a truncated value (that could be an empty string).

Did you just want to report this, @Gerrard-YNWA, or are you planning to send a PR?

Gerrard-YNWA commented 3 years ago

Thanks for explaining. Agree with allowing all the valid UTF-8 characters for label values and let me have a try.