Implement tagged metrics

lucasdicioccio commented 7 years ago

This is an experiment to address #6 and could help for #12 . This change uses a non-empty list of tags instead of single tag name for metrics.

The main design choice to figure out is whether tags should be key/values or values only. In this experiment I use values only.

Pros:

most simple API change to augment metric names, exposing only a low-overhead API
one can add different flavors of typeful APIs on top of this change
one can add semantics for key-values by adding some parsing rule in the tags (cf. example in upcoming change requests)
one is not bound to key/value semantics if set-of-labels is what a user wants

Cons:

does not store tag names, so more bookeeping is needed
may lead to diverging rules for formatting key-values in tags (I guess most people will keep it simple)
does not address the need to create new tags dynamically (with indsight, I guess it's good to expose an API which makes it harder to create unbounded amount of tags)

tibbe commented 7 years ago

Thanks for putting this together. I've added some high-level thoughts below. When you read through these it might help having this example use case in mind: the user wants to count the number of HTTP status codes for each URL requested in some (Haskell) web server. The counter would be used as follows (in pseudo-ish code):

main = do
    counter <- createCounter "myapp.request-status" ["url", "status"]
    setStatusCounter counter
    ...

requestEntryPoint :: Url -> IO ()
requestEntryPoint url = do
    counter <- getStatusCounter
    statusCode <- handle url
    MultiCounter.inc counter [url, status]

I think we want dimension names from the start, as it will 1) avoid a painful migration and 2) I really think we need them in the UI, when sending data to statsd, etc.
The user-facing API should make it convenient to increment a counter (or add a value to a Distribution) given a value for each dimension (as in the example above).

The latter suggests that we add new counter, etc. types that have some internal structure (e.g. they might internally be implemented as maps of regular counters).

lucasdicioccio commented 7 years ago

Sorry for the delay, pretty busy :-). So far I went for the simplest diff I could imagine for lack of time and a goal to keep the overhead minimal.

Even if we iterate, I still want to keep the ability to make a ref <- MultiCounter.getCounter counter [url, status] and use inc ref later in the code. Also for performance, I'd like to have a single hashmap for all dimensional breakdowns of all counters when we sampleAll rather than having nested hashmaps (maybe we can have both at a small RAM cost? -- I'll have to dive in the code again to tell).

Then, regarding the lookupOrCreate >=> increment functions I agree they belong to EKG but I'll focus on them in a 2nd step as I feel they are sugar above primitives. I agree that keeping the dimension names separate from the tags is useful as well.

tibbe commented 7 years ago

I think supporting ref <- MultiCounter.getCounter counter [url, status] makes sense. If we implement multi-dimensional metrics with a map from the key to a regular Counter (or Distribution or whatever) you should be able to do just that.

sampleAll needs to be reasonably cheap (e.g. cheap enough to be called once per second or so), but I don't want the returned data to be ambiguous (i.e. does "a.b.c" refer to a metric "a.b" with a dimension with the value "c" or a metric "a.b.c" that has no dimension?). One reason is that we might want to push it to other backends than statsd.

lucasdicioccio commented 6 years ago

Hi, it's me again. I've created https://github.com/tibbe/ekg-core/pull/22 which should supersede this MR, taking a pretty different approach and providing an API closer to what you described.

haskell-github-trust / ekg-core

Implement tagged metrics #17