Alternative approach to build metric names from tags in Dropwizard and other path-based backends

pbetkier commented 6 years ago

Hello, I'd like to propose an alternative way to create metric names in backends like Dropwizard, which don't support tags natively. Either to consider implementing it in Micrometer or as a suggestion for other developers on how to solve this problem in their systems.

We've been using Dropwizard metrics reported to Graphite in our company for a couple of years. Now we're moving to Micrometer as part of Spring Boot 2 migration, also adding Prometheus as an alternative storage. I describe problems we faced and our solution.

Issues when migrating to Micrometer

One of our concerns during migration was not to change Graphite metric paths when migrating from Spring Boot 1 on Dropwizard to Spring Boot 2 on Micrometer. Some exemplary metric paths were:

# at Graphite before Micrometer
process.jvm.memory.heap.used
process.jvm.memory.non-heap.used
api-requests.SomeController.someHandler.GET.200
api-requests.SomeController.someHandler.GET.500

Introducing tags in vanilla Micrometer results in the following:

# at Prometheus on Micrometer, great
process_jvm_memory_used_bytes {area=heap|non-heap}
api_requests_seconds {controller=SomeController, handler=someHandler, method=GET, code=200|500}

# at Graphite on Micrometer, totally different tree
process.jvm.memory.area.heap.used
process.jvm.memory.area.non-heap.used
api-requests.code.200.controller.SomeController.handler.someHandler.method.GET
api-requests.code.500.controller.SomeController.handler.someHandler.method.GET

Which brings a couple of problems:

Graphite metric tree is different, preventing a smooth migration to Spring Boot 2.
Metric tree structure emerges from tags alphabetic order, no longer follows the logical order of controller -> handler -> method -> code (captured in #595).
Metric paths are much longer – api-requests.SomeController.someHandler.GET.200 is clear enough, we don't need controller, handler, method and code segments.
Whenever a new tag is added the metric tree can get changed depending on alphabetic order.

Controlling path encoding with placeholders

Our approach is to give control over creating the metric path to the developer instead of relying on encoding logic in Micrometer. We provide our own HierarchicalNameMapper and PrometheusNamingConvention implementations which support placeholders in metric names:

meterRegistry.gauge("process.jvm.memory.{area}.used", Tags.of("area", "heap")), ...);
meterRegistry.gauge("process.jvm.memory.{area}.used", Tags.of("area", "non-heap")), ...);

metricRegistry.timer(
    "api-requests.{controller}.{handler}.{method}.{code}", 
    "controller", "SomeController", "handler", "someHandler", "method", "GET", "code", "200"
).record(...);

Our HierarchicalNameMapper implementation replaces all placeholders with their matching tags, mapping to our previous Graphite structure:

# at Graphite on Micrometer, with placeholders resolved
process.jvm.memory.heap.used
process.jvm.memory.non-heap.used
api-requests.SomeController.someHandler.GET.200

Our PrometheusNamingConvention removes all placeholders from metric name, mapping to the same name as just after introducing Micrometer:

# at Prometheus on Micrometer, with placeholders stripped
process_jvm_memory_used_bytes {area=heap|non-heap}
api_requests_seconds {controller=SomeController, handler=someHandler, method=GET, code=200}

Where to use placeholders

We use placeholders in all our code that is expected to support both Graphite and Prometheus. Either:

Library code, e.g. memory metrics.
Application code in process of migration from Graphite to Prometheus.

If an application doesn't need to report to Graphite and Prometheus simultaneously – either it's not using Prometheus yet or it's already completely migrated from Graphite – then placeholders are not required. When using Graphite only the application can define its metric names explicitly and ignore the tags argument. When using Prometheus only the application can use Micrometer API as it was designed.

Adoption

We started adopting this solution in our ~400 microservices stack. We register all our metrics from internal libraries using the placeholders mechanism and have a few services in the process of migration from Graphite to Prometheus also registering their metrics this way.

jkschneider commented 6 years ago

This is a really clever idea @pbetkier. I'm curious to see what your naming convention looks like. How do you implement name to fold in tags when the name signature only provides you the metric name, type, and base unit?

My initial reaction is that I don't think we'd want to go back and add placeholders to built-in metrics for a couple reasons:

I think it's reasonable to optimize for dimensional systems first, since hierarchical systems should continue to see less and less adoption over time.
I'm not sure everyone could agree on the same order of placeholders. The "controller", "handler", "method" ordering is sensible, but other tags aren't as related to one another. For example, I could see arguments for wanting "status code" to be before "HTTP method" or after, neither is more obviously correct.

I think what we could do is provide a Placeholders utility in Micrometer core that does a couple things. It allows you to define the mapping from OOTB names to a placeholder name. For example, in constructing a Placeholders instance, you could define that you want jvm.memory.used to be replaced by jvm.memory.{area}.used. When that Placeholders instance is bound to a particular registry, it adds a MeterFilter that maps the names and adds the placeholder-aware naming convention to the registry.

It may be reasonable and in fact beneficial for us to provide a default GraphitePlaceholders, because we could then publish a Grafana dash that demonstrates how to most effectively chart OOTB metrics. But folks are still free to use their own Placeholders if they too are trying to meet an existing internal standard.

Thoughts?

pbetkier commented 6 years ago

Glad you like the idea :)

Our naming convention simply removes the placeholders:

public class OurPrometheusNamingConvention extends PrometheusNamingConvention {

    private static final String PLACEHOLDER = "\\{[A-Za-z0-9_\\-\\.]+\\}";

    @Override
    public String name(String name, Meter.Type type, @Nullable String baseUnit) {
        String sanitizedName = name.replaceAll("\\." + PLACEHOLDER, "");
        return super.name(sanitizedName, type, baseUnit);
    }
}

Resolving placeholders with tags happens in HierarchicalNameMapper implementation that is provided for DropwizardMeterRegistry:

public class OurHierarchicalNameMapper implements HierarchicalNameMapper {

    @Override
    public String toHierarchicalName(Meter.Id id, NamingConvention convention) {
        String name = id.getName();

        // probably could be optimized for performance, works fine for us now
        for (Tag tag : id.getTags()) {
            name = name.replace("{" + tag.getKey() + "}", tag.getValue());
        }

        if (name.contains("{") || name.contains("}")) {
            throw new IllegalArgumentException("Some placeholders in the metric name do not have a matching tag! " +
                    "Metric name: " + id.getName() + ", after resolving with tags provided: " + name );
        }

        return name;
    }
}

Note that our implementations may not catch all the corner-cases yet.

I agree it's better from the Micrometer project point of view not to include placeholders mechanism in the project core, but as an opt-in possibility. I like the idea of a Placeholders class that is the entry-point for this feature and properly configures provided registries if requested explicitly. How to setup placeholders-aware HierarchicalNameMapper though? It's now configured in DropwizardMeterRegistry constructor.

Also, note that in our implementation of building a hierarchical metric name we don't encode tags without matching placeholders. So you have to make sure all the metrics with tags in your application are mapped to names with placeholders or else you risk getting failures to register e.g. gauge due to duplicates. Whenever you decide to drop-in out-of-the-box metrics for some tool you should know what metrics are reported and map them accordingly.

I could prepare a PR once we agree on the design.

jkschneider commented 6 years ago

We already have a builder type for StatsdMeterRegistry to cover the more complex configurations like custom line builders. This isn't so different. I can imagine such a builder for hierarchical registries containing an input for Placeholders.

So you have to make sure all the metrics with tags in your application are mapped to names with placeholders or else you risk getting failures to register

Good point. The risk is limited to gauges, function counters, and function timers which can only be described by one function. Countrs, timers, summaries would be fine.

pbetkier commented 6 years ago

OK. Shall I prepare a PR to discuss?

jkschneider commented 6 years ago

Sure, some form of this should make it into 1.1 I think.

shakuzen commented 5 years ago

Sorry for the delay in reviewing the pull request and getting this in a release. I've optimistically marked this for 1.2 so we can review it when merging changes for that.

pbetkier commented 4 years ago

How about moving forward with this? We're still using the described mechanism in our microservices and it works for us. The PR I created for discussion needs refreshing. I can do that, but I need to know if you're still interested in making this change.

jkschneider commented 4 years ago

@pbetkier Thanks for the reminder. My current sense is that the whole thing can be accomplished with just a PlaceholderHierarchicalNameMapper, the construction of which defines the mappings. Something like:

PlaceholderHierarchicalNameMapper.builder()
        .placeholder(MeterFilter.rename("jvm.memory.used", "process.jvm.memory.heap.used"))
        .placeholder(MeterFilter.rename("http.server.requests", "api-requests.{controller}.{handler}.{method}.{code}"))
        .build();

It's probably useful to define the placeholder mapping in terms of the whole Meter.Id such as the MeterFilter#map method does (so you can respond to base unit text, tags, etc.). There might be some new convenience methods to add to MeterFilter such as the rename(from, to) one hypothesized above.

What about MeterFilter? Do we suppose that any name and tag mappings occurred before the PlaceholderHierarchicalNameMapper kicks in you think?

pbetkier commented 4 years ago

I like your idea. I think it makes sense to contain this feature in only one optional class as opposed to making placeholders a global feature which impacts more of micrometer's codebase. Especially given how few metric backends are hierarchical in nature. I think it's less convenient for the Application code in process of migration from Graphite to Prometheus case than my original idea, but we can consider it a rare use case.

I'm not sure about configuring PlaceholderHierarchicalNameMapper with MeterFilter objects though:

I don't see a use case for any other operation than renaming. Why would I respond to base unit?
If I define a MeterFilter.rename() and a MeterFilter.map() which changes metric names, then configuration order matters and it may be confusing.
It's takes quite a lot of code to configure if each mapping requires defining it's own filter. It's fine for 1-2 mappings, but renaming all built-in metrics from micrometer requires dozens of mappings.
In practice these mappings could end up as e.g. Spring beans in user applications, but they cannot as they would mix with global MeterFilters that serve a different role. Perhaps a mapping should have a domain class of its own.

What are your thoughts? And what do you think about this instead?

PlaceholderHierarchicalNameMapper.Mapping mappingObject = ...;
// potentially defined as a Spring bean and constructed using
// PlaceholderHierarchicalNameMapper.Mapping.of(String,String)
// PlaceholderHierarchicalNameMapper.Mapping.from(Map<String,String>)

PlaceholderHierarchicalNameMapper.builder()
    .mapping("jvm.memory.used", "jvm.memory.{area}.used")  // for convenience
    .mapping(mappingObject)
    .build();

As a side note, I would argue that the mapping of jvm.memory.used into process.jvm.memory.used should be implemented by a user with a global MeterFilter, not in a HierarchicalNameMapper configuration.

micrometer-metrics / micrometer