Proposal for a new design

achille-roussel commented 7 years ago

I'm finding it harder and harder to have metric collection not be the bottleneck in some high-performance applications that I'm working on. I want to open the discussion on a new design to make metric collection 10 to 100x faster while keeping high-level abstractions to easily instrument applications.

Here are the most common issues:

Serializing the metrics is done inline when metrics are produced, it doesn't matter how fast we can make the serialization routines it's inefficient because it deals with memory that has nothing to do with the main work that the application is doing, likely causes a lot of cache misses, and is a single contention point for goroutines producing the metrics.
When no serialization is being done inline (for example with prometheus) the stats handler needs to maintain an in-memory representation of the state, this is equally inefficient as it requires hitting large amounts of memory that are out of the main working set, doing map lookups, synchronizations, and puts constraints on the ordering of tags.

Here are a couple of ideas I have on what we can do:

Batching seems necessary, allowing multiple metrics to be submitted at once from the application to the stats client would prevent frequent back and forth between the main code path and the stats production layer.
The application often knows when it's OK to pass a batch of metrics to be recorded, the API should leverage this. For example, measuring HTTP requests, all metrics for a request/response can be aggregated before being sent with a single call after the request is fully handled, moving the metric serialization out of the critical path.
Requiring tags to be sorted before the metrics is produced seems like the only way to avoid the cost at the lower layer. The program can then then order tags once for all metrics of a batch.
Batches of metrics can be efficiently represented by structs types like:
```
type HTTPMetrics struct {
Message struct {
Count int64 `metric:"count" type:"counter"`

Request struct {
  HeaderLength int64 `metric:"header.bytes" type:"histogram"`
  BodyLength   int64 `metric:"body.bytes" type:"histogram"`
  ...
}

Response struct {
  HeaderLength int64 `metric:"header.bytes"`
  BodyLength   int64 `metric:"body.bytes"`
  RTT          time.Duration `metric:"rtt.seconds"`
  ...
  Status int `tag:"http_res_status"`
}

// top-level tags get inherited by sub-metrics
Path string `tag:"http_req_path"`
...
} `metric:"http.message"`
}
```
There are a couple of reasons why I think this approach can be more powerful:
1. When generating the metric, the program just sets the value of each field, it's hard to make the code more efficient than a couple of assignments and increments that will likely be collocated to the actual working set of the application (metrics values can be embedded directly into structures of the program, or even allocated on the stack).
2. The implementation will need to use reflection to read the metric structs, this can be made highly efficient by doing runtime compilation of the reflection code (this technique is used in encoding/json and github.com/segmentio/objconv for example). This would make it possible to generate highly efficient ways to render the sorted list of tags. I also means that we can do more optimizations to like generating efficient hash tables to store and aggregate the metrics before flushing them to the network.
3. One of the nice things is the metric structs don't depend on special types, they use basic Go types or types from the standard library, so exposing metrics from library packages becomes trivial and is fully decoupled from the stats package itself.
4. Using structs brings the more declarative design (instead of the very imperative approach we have now), it also helps the developer thinks about how things are measured as part of the data structures of the program, not as random additions in the middle of the code.

Now this is a pretty big shift from the way we've been doing instrumentation, so I have a couple of questions:

What do you guys think about the idea?
Should this be part of stats? A subpackage? Or a different repository? The reason I'm asking is because I'm concerned about the high complexity of maintaining the code to support both the existing API and this one. Maybe the right path is to build this into a different package, then backport it if the approach is successful, or create ways to bridge between the two.

Let me know!

abraithwaite commented 7 years ago

First off, the idea of using structs with struct tags to define metrics just makes sense to me.

I don't think we should lock ourselves to the current API for a change this big (granted I haven't looked at the code in depth yet, so can't say for sure).

For batching, are we doing synchronous writes to the stats package? Or does it write to a buffered channel and return immediately? If we're not doing async writes already, that would drastically help in the short term without an API change. We could then batch inside the stats package itself (unless I'm misunderstanding some bit of the current implementation).

I do have worries about using a struct for metrics though such as:

0 values (obviously, can possibly be solved with tags)
Setting the same metric twice
Dynamic tags (sometimes you just can't avoid them)

Otherwise, it looks like the logical next step for metrics in golang. Let's do it!

achille-roussel commented 7 years ago

0 values (obviously, can possibly be solved with tags)

This is a good point, something like the omitempty of encoding/json seems like it would be the easiest way to address.

Setting the same metric twice

Do you mean tracking multiple values of a histograms within a single struct for example? There are different ways we can support this:

Nothing says the tags cannot be repeated multiple times, I think this is what I wanted to show with the example (see header.bytes and body.byte).
We can support arrays or slices of structs to submit multiple metrics.

Dynamic tags (sometimes you just can't avoid them)

Yes, here are a couple of ideas I have:

Passing a variable list of arguments of tags when publishing the metrics
Capture and inherit tags from the context (kind of like we do with stats.Engine.WithTags)
Having a convention that fields of type []Tag are used to pass dynamic list of tags

abraithwaite commented 7 years ago

Do you mean tracking to values of a histogram within a single struct for example?

Effectively, yeah. How do you measure the same value (metric name + tags all equal) twice before submitting it to the client library? Slices seem plausible, but perhaps cumbersome.

Similarly for the dynamic tags problem, we might have an issues submitting metrics using this pattern.

func (mc *MetricsClient) Submit(interface{}, tags...) {
    // which fields to apply the tags to?
}

achille-roussel commented 7 years ago

On your suggestion to use a channel to do async writes. The stats package used to be designed this way but publishing to a channel is actually more expensive than serializing a metric (~1-10us to publish to a channel vs ~300ns for serialization), so we moved away from it and to the current design.

achille-roussel commented 7 years ago

Effectively, yeah. How do you measure the same value (metric name + tags all equal) twice before submitting it to the client library? Slices seem plausible, but perhaps cumbersome.

The API has to accept a interface{} to support all metric struct types, so it's pretty simple to accept an array or a slice of structs. Embedding the array/slice within the metric struct is also possible.

Similarly for the dynamic tags problem, we might have an issues submitting metrics using this pattern.
func (mc *MetricsClient) Submit(interface{}, tags...) {
// which fields to apply the tags to?
}
In this case the tags would apply to everything, you make a point that we'll probably need finer grain control so embedding tag slices in the struct seems like the way to go, at the expense of having to sort the tags which would be costly, but should be fine if it doesn't happen on every metric (in my experience you know the tag names you want to set most of the time).

achille-roussel commented 7 years ago

Adding to the list of issues that I noticed:

The current design uses float64 everywhere, but very often we do increments of integral values. Serializing floats is actually not cheap, a lot of the performance profiles show that a lot of CPU time is often spent in strconv.AppendFloat. Having an API that both integer and floating point types will help address this issue.

yields commented 7 years ago

I like the idea of a subpackage, so we can experiment with this first before making the shift.

achille-roussel commented 7 years ago

Alright here are some numbers about the experiment embedding batches of metrics in struct types:

GOMAXPROCS=1

BenchmarkEngine/discard/Engine.ReportAt(struct)             10000000           122 ns/op
BenchmarkEngine/discard/Engine.ReportAt(struct:large)       10000000           217 ns/op
BenchmarkEngine/datadog/Engine.ReportAt(struct)             10000000           226 ns/op
BenchmarkEngine/datadog/Engine.ReportAt(struct:large)        2000000           599 ns/op

GOMAXPROCS=10


BenchmarkEngine/discard/Engine.ReportAt(struct)-10             20000000            66.0 ns/op
BenchmarkEngine/discard/Engine.ReportAt(struct:large)-10       20000000           113 ns/op
BenchmarkEngine/datadog/Engine.ReportAt(struct)-10             10000000           144 ns/op
BenchmarkEngine/datadog/Engine.ReportAt(struct:large)-10        5000000           313 ns/op

The _struct_ benchmark reports a struct with one metric, the _struct:large_ reports a struct with 10 fields. The cost of handling extra metrics increases by ~10ns per metric, the cost of serializing increases by ~20ns per metric, so total we can account for ~30ns of CPU time spent for each metric getting serialized.
This is pretty satisfying to my, when we look at the numbers with the current stats package:
- GOMAXPROCS=1

BenchmarkEngine/discard/Engine.Add.1x 20000000 64.3 ns/op 0 B/op 0 allocs/op BenchmarkEngine/discard/Engine.Add.10x 2000000 640 ns/op 0 B/op 0 allocs/op BenchmarkEngine/datadog/Engine.Add.1x 10000000 195 ns/op 0 B/op 0 allocs/op BenchmarkEngine/datadog/Engine.Add.10x 1000000 1888 ns/op 0 B/op 0 allocs/op

- GOMAXPROCS=10

BenchmarkEngine/discard/Engine.Add.1x-10 20000000 68.5 ns/op 0 B/op 0 allocs/op BenchmarkEngine/discard/Engine.Add.10x-10 2000000 682 ns/op 0 B/op 0 allocs/op BenchmarkEngine/datadog/Engine.Add.1x-10 10000000 200 ns/op 0 B/op 0 allocs/op BenchmarkEngine/datadog/Engine.Add.10x-10 1000000 1843 ns/op 0 B/op 0 allocs/op


We can clearly see that the cost increases linearly here.

I'm pretty excited about getting this code into production, I'd like your feedback on how to proceed, here are a couple of options I thought about:
1. Publish it as a different repository. The upside is there's no problem with backward compatibility, but I'm worried about splitting the effort and the increased maintenance cost of having two repositories to do pretty much the same thing.
2. Break the stats package API to support the new design, honestly there isn't much that needs to change for the commonly used types and methods, it would be mostly the various handler types and a cleanup of some abstractions that don't make sense anymore.
3. Extend the current stats package with the new APIs, in this case I'm mostly worried about making the package really messy, hard to use, document, and test (there would be 3+ ways of doing one thing).

I tend to like option (2) better for a few reasons:
- `stats` is a short package name which communicates well the purpose of the code
- it keeps the API to its minimal form (one design, one repo)
- with proper guidelines on how to do the migration it'll benefit all services we have that depend on it

Please let me know what you think!

yields commented 7 years ago

We use govendor in most places, so i don't mind you going with 2 :)

Be nice to see a comparison of the new proposed API, vs the old one!

achille-roussel commented 7 years ago

Here are more details about what needs to be changed:

The stats.Handler interface needs to be changed to support the batched API, which means adapting the datadog/influxdb/prometheus clients as well (replacing/removing things that don't make sense anymore). I don't expect this to be a problem because all the code I know about is using the stats package and doesn't directly deal with the clients for each system. The new handler interface looks like this:
```
type Handler interface {
HandleMeasures(time time.Time, measures ...Measure)
}
```
Getting rid of the stats.Counter, stats.Gauge, stats.Histogram, stats.Timer, stats.Clock types. I introduced hoping I could leverage the fact that caching some info in those structs would enable some performance improvements, but they were small or inexistent in the end so I'd rather cleanup the API to keep the focus on the preferred way of reporting metrics.
I plan on keeping the Incr/Add/Set/Observe/ObserveDuration functions, all the code I'm aware of depends on those. They are less efficient than going through the batch API but they are very convenient and there are cases where there's no need for performance. Adoption of the new version would be pretty painful if we were removing those.

The Engine will get new methods


func (eng *Engine) Report(metrics interface{}, tags ...Tag) { ... }

func (eng *Engine) ReportAt(time time.Time, metrics interface{}, tags ...Tag) { ... }


The metrics argument can be a struct carrying a batch of metrics, or a slice or array of those.

- The sub-packages providing convenience wrappers for collecting metrics of in various situations (httpstats/netstats/procstats) should be left unchanged as well. The implementations will be revisited to support the new batching API tho, that's kind of the point of all of this ;)

I'm open to other suggestions, now is a good time to think of any other improvements we want to make.

segmentio / stats

Proposal for a new design #51