strimzi / metrics-reporter

Prometheus Metrics Reporter for Apache Kafka server and client compoenents
Apache License 2.0
4 stars 7 forks source link

Test ignoring metric value duplicate key #22

Closed OwenCorrigan76 closed 1 month ago

OwenCorrigan76 commented 2 months ago

We should have a test to exercise what happens if we add a metric that would cause duplicated labels? This is in the KafkaMetricsCollector in the convert method and in the YammerMetricsCollector in the convert method. LOG.warn("Ignoring metric value duplicate key {}", v1);

OwenCorrigan76 commented 2 months ago

@mimaison @k-wall The logic that handles non-numeric metrics is in the DataPointSnapshotBuilder class. Here is where a new label with a metric value is created. Currently we unconditionally add our new label without checking if a label with the same name exists already. Is this the correct desired behaviour or should we also consider either of the following posibilities:

In the previous iteration for version 0.16.0 we had the following warning: LOG.warn("Ignoring metric value duplicate key {}", v1) Should we have something like this again?

Two question I have.

OwenCorrigan76 commented 2 months ago

II've written this to add the warning back in. WDYT?


 public static InfoSnapshot.InfoDataPointSnapshot infoDataPoint(Labels labels, Object value, String metricName) {
        Map<String, String> labelMap = new HashMap<>();
        labels.forEach(label -> labelMap.put(label.getName(), label.getValue()));

        String sanitizedMetricName = PrometheusNaming.sanitizeLabelName(metricName);
        String valueString = String.valueOf(value);

        if (labelMap.containsKey(sanitizedMetricName)) {
            LOG.warning(String.format("Ignoring metric value duplicate key: %s", sanitizedMetricName));
            labelMap.put(sanitizedMetricName, valueString);
        } else {
            labelMap.put(sanitizedMetricName, valueString);
        }

        Labels.Builder newLabelsBuilder = Labels.builder();
        for (Map.Entry<String, String> entry : labelMap.entrySet()) {
            newLabelsBuilder.label(entry.getKey(), entry.getValue());
        }

        Labels newLabels = newLabelsBuilder.build();
        return InfoSnapshot.InfoDataPointSnapshot.builder()
                .labels(newLabels)
                .build();
    }
OwenCorrigan76 commented 2 months ago

Seems to be working as expected:

WARNING: Ignoring metric value duplicate key: name
Ignoring metric value duplicate key: name

java.lang.IllegalArgumentException: Duplicate label name: 'name'.
mimaison commented 2 months ago

I think there are 2 different things.

  1. Existing labels that once sanitized collide: When we sanitize label names, some characters are replaced, so a metric could have 2 labels that only collide when they are sanitized, for example some-label and some_label would collide. Previously we would log a warning in that case: https://github.com/strimzi/metrics-reporter/blob/5d5c3b20c5ec2e9c7c9ef730c9057eb50007f496/src/main/java/io/strimzi/kafka/metrics/MetricFamilySamplesBuilder.java#L82
  2. Causing a collision when we insert a new label for non-numeric metrics. As far as I can tell, previously we would always insert our new label, as seen in https://github.com/strimzi/metrics-reporter/blob/5d5c3b20c5ec2e9c7c9ef730c9057eb50007f496/src/main/java/io/strimzi/kafka/metrics/KafkaMetricsCollector.java#L118 and https://github.com/strimzi/metrics-reporter/blob/5d5c3b20c5ec2e9c7c9ef730c9057eb50007f496/src/main/java/io/strimzi/kafka/metrics/YammerMetricsCollector.java#L132 and not log anything.

It would be good to define the expected behavior in both cases.

mimaison commented 2 months ago

One thing you should have done is check the behavior of Labels. A quick test shows we can't insert labels with the same names, doing so throws IllegalArgumentException.

Running:

Labels.Builder labelsBuilder = Labels.builder();
labelsBuilder.label("k", "v");
labelsBuilder.label("k", "v");
Labels labels = labelsBuilder.build();

produces:

java.lang.IllegalArgumentException: k: duplicate label name
    at io.prometheus.metrics.model.snapshots.Labels.validateNames(Labels.java:174)
    at io.prometheus.metrics.model.snapshots.Labels.sortAndValidate(Labels.java:164)
    at io.prometheus.metrics.model.snapshots.Labels.of(Labels.java:108)
    at io.prometheus.metrics.model.snapshots.Labels$Builder.build(Labels.java:437)

I get the same behavior if I call labels.add("k", "v").

OwenCorrigan76 commented 2 months ago

So it is throwing an exception already if there are duplicate labels? Just not being logged?

Does this not cover the some-label value?

private static void validateNames(String[] names, String[] prometheusNames) {
        for(int i = 0; i < names.length; ++i) {
            if (!PrometheusNaming.isValidLabelName(names[i])) {
                throw new IllegalArgumentException("'" + names[i] + "' is an illegal label name");
            }

            if (i > 0 && prometheusNames[i - 1].equals(prometheusNames[i])) {
                throw new IllegalArgumentException(names[i] + ": duplicate label name");
            }
        }

    }

Does this happen before or after the labels are sanitised?

OwenCorrigan76 commented 2 months ago

Having gone through it, this happens after lebels are sanitized

OwenCorrigan76 commented 2 months ago

Whether there is an existing duplicate label or a newly inserted one, once the duplicate is detected after sanitization, an exception is thrown, right? So if k-1 isn't sanitised, we get the is an illegal label name exception. I've tested this and it does sanitize prohibited labels. When the label is sanitized, then it becomes k_1 and if we have this label already we get the : duplicate label name exception. This terminates the metrics processing right? Does having a warning means the the metrics will still be gathered without crashing the whole thing? Questions still for me are: Do we want a hard stop with an exception if there are duplicate lables? Do we want to issue a warning, don't add the duplicate and carry on? *Do we want to alter the duplicate so that both metrics can be logged (is this even possible or desired)?

OwenCorrigan76 commented 2 months ago

@mimaison Rewriting the infoDataPoint method (below), this catches a duplicate label name after sanitization and logs a warning inststead of throwing an exception. To me, this behaviour makes sense. WDYT? I've also written the test which I can commit and push if we agree this is the desired behaviour?


public static InfoSnapshot.InfoDataPointSnapshot infoDataPoint(Labels labels, Object value, String metricName) {
        Map<String, String> labelsMap = new LinkedHashMap<>();
        labels.forEach(label -> labelsMap.put(PrometheusNaming.sanitizeLabelName(label.getName()), label.getValue()));

        String sanitizedMetricName = PrometheusNaming.sanitizeLabelName(metricName);
        String valueStr = String.valueOf(value);
        if (labelsMap.containsKey(sanitizedMetricName)) {
            LOG.warning("Ignoring metric value duplicate key: " + sanitizedMetricName);
        } else {
            labelsMap.put(sanitizedMetricName, valueStr);
        }

        Labels.Builder newLabelsBuilder = Labels.builder();
        labelsMap.forEach(newLabelsBuilder::label);
        Labels newLabels = newLabelsBuilder.build();
        return InfoSnapshot.InfoDataPointSnapshot.builder()
                .labels(newLabels)
                .build();
    }
mimaison commented 2 months ago

Do we want a hard stop with an exception if there are duplicate lables? I don't think so. If we throw and don't handle the exception, then no metrics will be collected. As monitoring is crucial this is not an option.

Do we want to issue a warning, don't add the duplicate and carry on? Yes we should log something so operators are informed if this happens. A warning with details about the issue seems right.

Do we want to alter the duplicate so that both metrics can be logged (is this even possible or desired)? I'm not sure I understand exactly what you mean. Can you provide an example?

@OwenCorrigan76 If you want feedback on a piece of code it's best if you open a PR or at least commit to a branch in your fork. This allows other participants to comment, provide feedback and easily run the code. It's a bit clunky if for example I have to tell you have one that the 10th lines there's something not quite right.

The Labels class has a contains() method. Why are you not using that to check if a label name already exist? Creating a temporary Map and then creating another Labels instance is unnecessary. Also at this point the label names have already been sanitized. Finally this code only addresses the 2nd point from my comment above.

mimaison commented 1 month ago

Done in https://github.com/strimzi/metrics-reporter/commit/210e4d0273ec81a4e6af0db5aaaa758cc8496968