Allow issuing a metric having as value the timestamp of the last update of the metric

kir4h commented 4 years ago

Because of the nature of multilog_exporter (parsing logs and issuing metrics on their contents), the metrics endpoint will issue the same value until the logs triggers again the condition (if this log is generated by a cron job, then it might be its next execution).

But there are scenarios where this value might not change:

The metric might not change, the log generates again the same metric value
The log doesn´t trigger the metric (job not executed for some reason, job failing and not triggering the appropiate log entry...)

In such scenarios, {{multilog_endpoint}} doesn´t give enough visibility, as if we are in the second scenario we probably will want to investigate the root cause.

As discussed in https://github.com/hoffie/multilog_exporter/pull/1#issuecomment-580151266 , a possible workaround is to have a sidecar metric added to the configuration that reports the timestamp when the metric is pushed. But given that this seems a useful thing to have, it makes sense to have it built-in, so that we can just tell {{multilog_exporter}} that we want to have that metric issued.

As a reference, pushgateway presents this behaviour through:

In order to make it easier to alert on failed pushers or those that have not run recently, the Pushgateway will add in the metrics push_time_seconds and push_failure_time_seconds with the Unix timestamp of the last successful and failed POST/PUT to each group. This will override any pushed metric by that name. A value of zero for either metric implies that the group has never seen a successful or failed POST/PUT.

(push_failure_time doesn´t make much sense in this scenario)

There are several approaches (speaking of user facing side) to expose this.

I was thinking of something like

timestamp: false
timestamp_metric: push_time_seconds  
logs:
- path: /logs/subtitles_download.log
  timestamp: false
  timestamp_metric: push_time_seconds
  patterns:
  - match: .*total time execution:\s*(?P<seconds>\d+)\s*seconds

Where:

timestamp: Enables of disables such metric. Defaults to false for backwards compatibility
timestamp_metric: Sets the name for the metric. Defaults to push_time_seconds for consistency with pushgateway
Both can be set at the top configuration level, or at "path level", in case some wants to have different configuration for some logs and not for others (maybe this scenario is quite rare and not worth it, not sure)
The metric will be issues for every metric issued by the log (as long as it is configured this to be enabled)
The metric will have as
- name: timestamp_metric
- value: the epoch in seconds
- labels: the same labels as the log related metric. Not sure if an extra label with the related log label should be added, pushgateway doesn´t do this so I would follow the same approach and just keep the same labels as the original metric

I can work on this issue, let´s just agree on the description.

hoffie commented 4 years ago

Sounds like a plan :)

Maybe a single configuration option would suffice? E.g. when setting timestamp_metric it's rather obvious that the user would want to enable this feature, meaning that we could skip the timestamp: true part, couldn't we?

Or the other way round: If we come up with a useful default, it may be enough to allow timestamp: true and skip user-selectable names? I'm not sure if I like push_time_seconds. I would like the consistency with pushgateway, but as we are talking about logs, "push" is not really a good fit, is it? Another idea would be to take the metric: field and append _updated_seconds or something, although this could lead to rather long metric names...

Supporting it on the path level also sounds useful.

All in all, I don't feeld too strongly, but I'd like to keep it simple. Thanks for the initiative!

kir4h commented 4 years ago

I agree with your comments

Or the other way round: If we come up with a useful default, it may be enough to allow timestamp: true and skip user-selectable names?

I prefer this approach, it's simpler for the end user and I don't see the use case of configurable metric name for this one (it should be the other way around, promql offers that flexibility)

I'm not sure if I like push_time_seconds. I would like the consistency with pushgateway, but as we are talking about logs, "push" is not really a good fit, is it?

Indeed! Push doesn't make sense at all, I didn't even think about it :) Random thinkings: updated_timestamp_seconds? last_timestamp_seconds? last_update_seconds?

Another idea would be to take the metric: field and append _updated_seconds or something, although this could lead to rather long metric names...

Another possible downside (compared to having a predefined name): Imagine we have a bunch of scripts monitored this way, where for each one we issue this timestamp.

Having a single name would allow having a single query (whatever_time_seconds), and we would have a series for each different group (let's say metric=A, metric=B, or job=A and job=B). Adding new script logs wouldn't even require us to update our query, metrics would show up automatically.

By having different names one would need to add each log individually, seems harder to manage (and labels allow us to identify each metric even when having the same metric name).

Another question: Does it make sense for the timestamp metric (if we keep a common name for this metric) to have every label of the original metric, plus an additional one metric=<metric name>? (as a way of linking both, thinking of the scenario where two different metrics share the same labels, so that their timestamps don't get mixed up)

hoffie / multilog_exporter

Allow issuing a metric having as value the timestamp of the last update of the metric #3