Closed OwenCorrigan76 closed 4 months ago
@mimaison @k-wall The logic that handles non-numeric metrics is in the DataPointSnapshotBuilder class. Here is where a new label with a metric value is created. Currently we unconditionally add our new label without checking if a label with the same name exists already. Is this the correct desired behaviour or should we also consider either of the following posibilities:
In the previous iteration for version 0.16.0 we had the following warning:
LOG.warn("Ignoring metric value duplicate key {}", v1)
Should we have something like this again?
Two question I have.
II've written this to add the warning back in. WDYT?
public static InfoSnapshot.InfoDataPointSnapshot infoDataPoint(Labels labels, Object value, String metricName) {
Map<String, String> labelMap = new HashMap<>();
labels.forEach(label -> labelMap.put(label.getName(), label.getValue()));
String sanitizedMetricName = PrometheusNaming.sanitizeLabelName(metricName);
String valueString = String.valueOf(value);
if (labelMap.containsKey(sanitizedMetricName)) {
LOG.warning(String.format("Ignoring metric value duplicate key: %s", sanitizedMetricName));
labelMap.put(sanitizedMetricName, valueString);
} else {
labelMap.put(sanitizedMetricName, valueString);
}
Labels.Builder newLabelsBuilder = Labels.builder();
for (Map.Entry<String, String> entry : labelMap.entrySet()) {
newLabelsBuilder.label(entry.getKey(), entry.getValue());
}
Labels newLabels = newLabelsBuilder.build();
return InfoSnapshot.InfoDataPointSnapshot.builder()
.labels(newLabels)
.build();
}
Seems to be working as expected:
WARNING: Ignoring metric value duplicate key: name
Ignoring metric value duplicate key: name
java.lang.IllegalArgumentException: Duplicate label name: 'name'.
I think there are 2 different things.
some-label
and some_label
would collide. Previously we would log a warning in that case: https://github.com/strimzi/metrics-reporter/blob/5d5c3b20c5ec2e9c7c9ef730c9057eb50007f496/src/main/java/io/strimzi/kafka/metrics/MetricFamilySamplesBuilder.java#L82It would be good to define the expected behavior in both cases.
One thing you should have done is check the behavior of Labels
. A quick test shows we can't insert labels with the same names, doing so throws IllegalArgumentException
.
Running:
Labels.Builder labelsBuilder = Labels.builder();
labelsBuilder.label("k", "v");
labelsBuilder.label("k", "v");
Labels labels = labelsBuilder.build();
produces:
java.lang.IllegalArgumentException: k: duplicate label name
at io.prometheus.metrics.model.snapshots.Labels.validateNames(Labels.java:174)
at io.prometheus.metrics.model.snapshots.Labels.sortAndValidate(Labels.java:164)
at io.prometheus.metrics.model.snapshots.Labels.of(Labels.java:108)
at io.prometheus.metrics.model.snapshots.Labels$Builder.build(Labels.java:437)
I get the same behavior if I call labels.add("k", "v")
.
So it is throwing an exception already if there are duplicate labels? Just not being logged?
Does this not cover the some-label
value?
private static void validateNames(String[] names, String[] prometheusNames) {
for(int i = 0; i < names.length; ++i) {
if (!PrometheusNaming.isValidLabelName(names[i])) {
throw new IllegalArgumentException("'" + names[i] + "' is an illegal label name");
}
if (i > 0 && prometheusNames[i - 1].equals(prometheusNames[i])) {
throw new IllegalArgumentException(names[i] + ": duplicate label name");
}
}
}
Does this happen before or after the labels are sanitised?
Having gone through it, this happens after lebels are sanitized
Whether there is an existing duplicate label or a newly inserted one, once the duplicate is detected after sanitization, an exception is thrown, right?
So if k-1
isn't sanitised, we get the is an illegal label name
exception. I've tested this and it does sanitize prohibited labels. When the label is sanitized, then it becomes k_1
and if we have this label already we get the : duplicate label name
exception. This terminates the metrics processing right? Does having a warning means the the metrics will still be gathered without crashing the whole thing?
Questions still for me are:
Do we want a hard stop with an exception if there are duplicate lables?
Do we want to issue a warning, don't add the duplicate and carry on?
*Do we want to alter the duplicate so that both metrics can be logged (is this even possible or desired)?
@mimaison Rewriting the infoDataPoint method (below), this catches a duplicate label name after sanitization and logs a warning inststead of throwing an exception. To me, this behaviour makes sense. WDYT? I've also written the test which I can commit and push if we agree this is the desired behaviour?
public static InfoSnapshot.InfoDataPointSnapshot infoDataPoint(Labels labels, Object value, String metricName) {
Map<String, String> labelsMap = new LinkedHashMap<>();
labels.forEach(label -> labelsMap.put(PrometheusNaming.sanitizeLabelName(label.getName()), label.getValue()));
String sanitizedMetricName = PrometheusNaming.sanitizeLabelName(metricName);
String valueStr = String.valueOf(value);
if (labelsMap.containsKey(sanitizedMetricName)) {
LOG.warning("Ignoring metric value duplicate key: " + sanitizedMetricName);
} else {
labelsMap.put(sanitizedMetricName, valueStr);
}
Labels.Builder newLabelsBuilder = Labels.builder();
labelsMap.forEach(newLabelsBuilder::label);
Labels newLabels = newLabelsBuilder.build();
return InfoSnapshot.InfoDataPointSnapshot.builder()
.labels(newLabels)
.build();
}
Do we want a hard stop with an exception if there are duplicate lables? I don't think so. If we throw and don't handle the exception, then no metrics will be collected. As monitoring is crucial this is not an option.
Do we want to issue a warning, don't add the duplicate and carry on? Yes we should log something so operators are informed if this happens. A warning with details about the issue seems right.
Do we want to alter the duplicate so that both metrics can be logged (is this even possible or desired)? I'm not sure I understand exactly what you mean. Can you provide an example?
@OwenCorrigan76 If you want feedback on a piece of code it's best if you open a PR or at least commit to a branch in your fork. This allows other participants to comment, provide feedback and easily run the code. It's a bit clunky if for example I have to tell you have one that the 10th lines there's something not quite right.
The Labels
class has a contains()
method. Why are you not using that to check if a label name already exist? Creating a temporary Map
and then creating another Labels
instance is unnecessary. Also at this point the label names have already been sanitized. Finally this code only addresses the 2nd point from my comment above.
We should have a test to exercise what happens if we add a metric that would cause duplicated labels? This is in the
KafkaMetricsCollector
in theconvert
method and in theYammerMetricsCollector
in theconvert
method.LOG.warn("Ignoring metric value duplicate key {}", v1);