Closed kir4h closed 4 years ago
Thanks for the PR, looks interesting, will have a deeper look in the coming days. :)
I really like it code-wise, thanks for your effort!
I am a bit unsure about the semantics. Prometheus-wise, such metrics are going to be problematic, aren't they? Having a timestamp as a label might well lead to cardinality explosion.
Can you describe what the use case for this is? Based on that, we can better judge whether to merge this as-is or if there are any other options for achieving your goals.
I agree that something has to be fixed -- the docs are currently wrong. Either the feature gets implemented or the docs will have to be corrected.
Let me know what you think.
am a bit unsure about the semantics. Prometheus-wise, such metrics are going to be problematic, aren't they? Having a timestamp as a label might well lead to cardinality explosion.
Indeed! I realized this some days after raising my PR, while making usage of the output :)
Can you describe what the use case for this is?
Scenario My journey began with a set of shell scripts I wanted to instrument. I started using Pushgateway for this, by adding a small logic to some of them to push their metrics. But then I realized I shouldn't "contaminate" the scripts with prometheus awareness, and ended up thinking that "something" should derive that info from the logs and generate those metrics (that's how I ended up using this project)
When running such scripts regularly through cron jobs, a nice metric is being aware on when the script was successfully executed for the last time (so that I can generate an alert for instance if it doesn't happen when it's supposed to).
While using this project, I realized I had lost one of the goodies from pushgateway: for every pushed metric for a given group, generates additional metrics (push_time_seconds
and push_failure_time_seconds
) that allowed knowing when was the last time the metric was reported.
First approach
In order to overcome this, I though of using labels for my metric. Whenever a regexp was matched in a log file, and that a corresponding metrics was to be issued, a label was added to the timestamp. This way, I could read this value in Prometheus/Grafana and do something about it.
It was my first contact with metric generation, so only later I realized the cardinality issue: Each metric, because of having different label value for timestamp
, was increasing cardinality, and this would continue happening on every metric issuing. At that point I should have come back and commented it in this PR, but it didn't cross my mind, sorry!
Second approach
My second (and current approach) has been to add a "sidecar" metric to every match, that reports this value
logs:
- path: /logs/my.log
patterns:
- match: .*total time execution:\s*(?P<seconds>\d+)\s*seconds
metric: duration_seconds
type: gauge
help: Execution time in seconds
action: set
continue: true
value: $seconds
labels:
instance: myInstance
job: myJobName
- match: .*total time execution:\s*(?P<seconds>\d+)\s*seconds
metric: last_execution
type: gauge
help: Last execution time in seconds
action: set
continue: true
value: now()
labels:
instance: myInstance
job: myJobName
This isn't ideal because for every match, I need to inject a repetitive block that issues this "sidecar metric"
Possible third approach
I was thinking that a possible approach would be multilog_exporter
mimicking pushgateway behaviour, having a boolean flag that when set would produce an additional metric reporting the last time the metric was updated for a given group.
I still have little Prometheus knowledge so I might be missing something, but I guess if Pushgateway does this is because there is a legitimate use case for it.
What do you think?
Your second approach would be what I have suggested with the current version, yes. :)
I was thinking that a possible approach would be multilog_exporter mimicking pushgateway behaviour, having a boolean flag that when set would produce an additional metric reporting the last time the metric was updated for a given group.
This sounds like a useful enhancement. Would probably nice to be able to control that by metric? Would you want to work on a PR for that one? Guess a new one would make sense.
Closing this in favour of #3
First of all, thanks for your project! While browsing for a way to generate metrics from arbitrary logs, I didn't want to go to complex/grok based solutions since my requirement was pretty simple. I thought about writing this functionality myself only to find out someone had already done so :)
According to the documentation, the function
now()
should be valid for label values:This PR makes
now()
evaluated also for labels.I have split the PR into 3 commits:
now()
usage in metric values. Please correct me if I'm wrong or misunderstood the testFeel free to drop any commit or let me know if any changes required/suggested.