spotify / semantic-metrics

Capturing meaningful metrics in your Java application
Apache License 2.0
66 stars 36 forks source link

FastForwardReporter appends the 'prefix' twice to the `count` metric of Meters #101

Closed mattnworb closed 3 years ago

mattnworb commented 3 years ago

I am using FastForwardReporter v1.1.1 via Apollo's metrics module (https://github.com/spotify/apollo/blob/1.x/modules/metrics/src/main/java/com/spotify/apollo/metrics/MetricsModule.java), which builds a reporter with prefix of MetricId.build("apollo") (plus some tags).

If I run ffwd locally to see what metrics are sent by a locally-running service to ffwd, I see that some metrics are sent with key=apollo and some are sent with key=apollo.apollo, and I think the latter is caused by logic in FastForwardReporter itself.

# launch ffwd
$ docker run --name=local-ffwd --rm -d -p 19091:19091/udp -p 19000:19000 -p 8080:8080 spotify/ffwd:latest
# check the logs
$ docker logs local-ffwd 2>&1 | grep "what=rpc-started"
16:36:57.930 [ffwd-scheduler-0] INFO  com.spotify.ffwd.debug.DebugPluginSink - M#141: Metric(key=apollo, value=Value.DoubleValue(value=0.1433062621147579), timestamp=1612802212648, tags={grpc-type=unary, target-service=remote-config-resolver, stat=1m, grpc-service=grpc.health.v1.Health, service-framework-version=1.17.4, grpc-method=Check, service-framework=apollo, unit=rpc/s, what=rpc-started, application=salem-api, grpc-component=client, metric_type=meter, host=82820c54a43e}, resource={})
16:36:57.930 [ffwd-scheduler-0] INFO  com.spotify.ffwd.debug.DebugPluginSink - M#142: Metric(key=apollo, value=Value.DoubleValue(value=0.18710139700632353), timestamp=1612802212649, tags={grpc-type=unary, target-service=remote-config-resolver, stat=5m, grpc-service=grpc.health.v1.Health, service-framework-version=1.17.4, grpc-method=Check, service-framework=apollo, unit=rpc/s, what=rpc-started, application=salem-api, grpc-component=client, metric_type=meter, host=82820c54a43e}, resource={})
16:36:57.930 [ffwd-scheduler-0] INFO  com.spotify.ffwd.debug.DebugPluginSink - M#143: Metric(key=apollo.apollo, value=Value.DoubleValue(value=1.0), timestamp=1612802212652, tags={grpc-type=unary, target-service=remote-config-resolver, unit=rpc, what=rpc-started, application=salem-api, grpc-service=grpc.health.v1.Health, service-framework-version=1.17.4, grpc-component=client, metric_type=counter, host=82820c54a43e, grpc-method=Check, service-framework=apollo}, resource={})

Note that the same what is reported under key=apollo and key=apollo.apollo, while the application code does not set any key at all for this MetricId.

I think I know what causes this:

On each tick when report(..) is called, it will iterate over all gauges, counters, histograms, meters, timers, etc:

https://github.com/spotify/semantic-metrics/blob/2bee9e7f8e80779e7a7cfaa463a56a91dfffe2c1/ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java#L239

reportMetered(MetricId, Meter) will overwrite the MetricId argument in order to prepend this.prefix to the supplied MetricId, and then translate the MetricId into a com.spotify.ffwd.Metric instance: https://github.com/spotify/semantic-metrics/blob/2bee9e7f8e80779e7a7cfaa463a56a91dfffe2c1/ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java#L299-L309

since a Meter is also a Counter, the last line of reportMetered(MetricId, Meter) calls reportCounter(MetricId, Meter), but with the overwritten MetricId (which had this.prefix prepended onto it), which repeats the same prepend operation before translating the MetricId into a com.spotify.ffwd.Metric instance:

https://github.com/spotify/semantic-metrics/blob/2bee9e7f8e80779e7a7cfaa463a56a91dfffe2c1/ffwd-reporter/src/main/java/com/spotify/metrics/ffwd/FastForwardReporter.java#L277-L283

This will result in the FastForwardReporter's prefix field being prepended twice into the metric's ID. ~reportTimer has the same behavior~ (update: I was wrong on this).

An easy fix here would be for reportMeter and reportTimer to pass the original MetricId, not the overwritten one, into reportCounter.

mattnworb commented 3 years ago

I think this behavior has existed since c9cc31273ff62dd3f126d02adb0cb490f4fe2097

lmuhlha commented 3 years ago

https://github.com/spotify/semantic-metrics/pull/102 Thanks @mattnworb :)

lmuhlha commented 3 years ago

Fix in semantic metrics 1.1.7.
Will roll this out to apollo with some warnings shortly.