Closed pmoogi-redhat closed 3 years ago
/hold
get_gauge_or_counter is not working as per expectations, throwing some errors hence leaving it to get_gauge type for total_bytes_collected metric. require some help there. have tested the functionality by looking at prometheus dashboard if total_bytes_collected getting published correctly. it is working fine as per my stand-alone test environment.
/approve
Final names for the metrics:
log_logged_bytes_total: total number of bytes written to a single log file path, accounting for rotations.
log_collected_bytes_total: total number of bytes collected from a single log file.
Both metrics have the label "path" which is the absolute path to the log file.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alanconway, pmoogi-redhat
The full list of commands accepted by this bot can be found here.
The pull request process is described here
New changes are detected. LGTM label has been removed.
@jcantrill @pmoogi-redhat I think we can re-factor this code into a single, fully independent plug-in belonging to us, with no patches to fluentd. That will solve the labelling problems. The idea is to follow the model of the fluentd prometheus-tail-monitor: https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/vendored_gem_src/fluent-plugin-prometheus/lib/fluent/plugin/in_prometheus_tail_monitor.rb#L1 That plugin scans for registered in-tail plugins and uses inside knowledge of the in-tail implementation to grab the read-position and inode values collected by in-tail. It samples on a timer, so it doesn't see every read - but that doesn't matter. In-tail keeps watching the old inode for a grace period of several seconds after the file rolls over, which gives us time to register the last read position -so we should get a full count. I've started writing this out, but I'm not sure we'll have time to get it done and tested for 5.1. I'll try to finish toda so @pmoogi-redhat can review & test it tomorrow.
@pmoogi-redhat: The following tests failed, say /retest
to rerun all failed tests:
Test name | Commit | Details | Rerun command |
---|---|---|---|
ci/prow/unit | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test unit |
ci/prow/elastic-operator-e2e | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test elastic-operator-e2e |
ci/prow/images | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test images |
ci/prow/cluster-logging-operator-e2e | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test cluster-logging-operator-e2e |
ci/prow/clo-functional | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test clo-functional |
ci/prow/smoke | 4c81554d63aae6c326df680829dbc0b76fc05f27 | link | /test smoke |
Full PR test history. Your PR dashboard.
Separate PR is raised to enable this new metric as a new plugin - new_plugin_Metric-for-inbound-log-data-loss-at-the-collector-JIRA-ticket-LOG-1032. Hence closing this PR which reflects in_tail and fluent-prometheus-plugin changes.
Description
Currently in_tail plugin doesn't support publishing of inbound logloss - i.e. difference between total bytes written to disk (logfile) and total bytes collected or read by fluentd. This PR got changes in fluentd/lib/fluent/plugin/in_tail.rb, and fluent-plugin-prometheus/lib/fluent/plugin/in_prometheus_tail_monitor.rb plugins to enable publishing of the below parameters
/cc @alanconway @jcantrill /assign @alanconway
/cherry-pick
Links
Depending on PR(s):
Bugzilla:
Github issue:
JIRA: https://issues.redhat.com/browse/LOG-1213
Enhancement proposal: many rotations getting missed by fluentd, next enhancement proposal is to get fluentd know about all actual rotations done by CRIO/conmon process by reading extra meta data such as. Based on what rotations fluentd could track and which one got missed computing log-loss as [no-of-rotations-those-missed * maxsizeoflogfile + sum over all tracked rotations by fluentd as [totalbytes_logged_in_disk - totalbytes_collected_from_disk]