(TK-316) Improved client metrics support

rlinehan commented 8 years ago

This PR improves on #51 and hopefully addresses all feedback. In particular, it improves the metrics API by:

Moving the get-client-metrics(-data) functions off of the client
Adding a with-url-and-method namespace
Adding a ClientTimer class that wraps Timer and holds onto the metric name, and url, method, or metric id used to create the timer
Changing the getClientMetrics(Data) methods to return a map of metric category to an array of timers or timer data
Adding getClientMetricsBy* methods to filter by a url, url and method, or metric id

rlinehan commented 8 years ago

I think this addresses everything @cprice404 and I were discussing on #51. I decided to just return arrays of timers/metric data, rather than having it indexed by metric tuple/url. It seems like most of the time users will want to iterate over all of these. Furthermore, I ran into difficulties trying to do this for the case where all the different categories are being returned, because then some of the keys would be arrays and some would be strings, which doesn't work. In addition, it didn't seem like indexing by metric id would be that useful even when only metric id timers were being returned, since much of the time users will probably want to dump this data into json for a status endpoint, and then they will need to serialize the metric ids into a string, since json doesn't allow arrays as keys. Thus, if we were to do this, we would be doing extra work that might be for nothing and cause users to also do more work.

There are a few things that I think still need to be done, the main thing being renaming. I talked with @nfagerlund about how things are named in this PR, and he felt that the one thing that should be changed are the names of the metric type, which are currently "bytes-read" and "response-init." He recommended going with names that show how they are semantically related - for example "full-response" and "initial-response" or "response-complete" and "response-init." I like either of these options, with maybe a slight preference to "full-response" and "initial-response."

I have two other questions that I think affect the API: 1) should we return nil if a nil metric registry is passed in to the get-client-metrics-* functions (and java versions), or throw an error. I have this currently implemented to return nil, but I think that it actually makes more sense (now that these functions aren't on the client) to just throw an error. 2) right now all of the get-client-metrics-* functions have two arities to allow for taking an optional metric type. Right now there's only one metric type, but this is done for forwards compatibility (the functions with no metric type return only the bytes read timers/data, and they can continue doing that in the future). However, I think we could eliminate the arities that take the metric type and just have it hardcoded everywhere, and then later on add in another arity or an option map or something when/if we add additional metric types.

In addition, I would like to do some refactoring of some of the tests, and there are a couple places where the implementation could probably do with some refactoring, but I wanted to get a PR up and make sure the API was agreed on, and then these changes could even be made after this PR is been merged.

rlinehan commented 8 years ago

ping @camlow325 for review

camlow325 commented 8 years ago

nfagerlund ... felt that the one thing that should be changed are the names of the metric type, which are currently "bytes-read" and "response-init." He recommended going with names that show how they are semantically related - for example "full-response" and "initial-response" or "response-complete" and "response-init." I like either of these options, with maybe a slight preference to "full-response" and "initial-response."

Either of the two alternatives sound better to me as well than what we currently have. If you like "full-response" and "initial-response" best, I'm good with that.

camlow325 commented 8 years ago

1) should we return nil if a nil metric registry is passed in to the get-client-metrics-* functions (and java versions), or throw an error. I have this currently implemented to return nil, but I think that it actually makes more sense (now that these functions aren't on the client) to just throw an error.

If I were implementing this from scratch, I probably would have gone with throwing an exception over returning nil. If we had cases where nil could be returned for cases where no metrics were found vs. just the metrics registry passed in being nil, I'd feel even more strongly that throwing an error to clearly distinguish between the two cases. I think we're always returning non-nil data structures if a valid registry is passed in, though, so probably not relevant here. Suppose I'd defer to your judgement on this one.

camlow325 commented 8 years ago

2) right now all of the get-client-metrics-* functions have two arities to allow for taking an optional metric type. Right now there's only one metric type, but this is done for forwards compatibility (the functions with no metric type return only the bytes read timers/data, and they can continue doing that in the future). However, I think we could eliminate the arities that take the metric type and just have it hardcoded everywhere, and then later on add in another arity or an option map or something when/if we add additional metric types.

Even though we only support 1 type at this point, I like the idea of having it possible up front to filter for only the id. Might be annoying for clients that just are interested in the data for the 1 type to be using a function that suddenly starts returning a bunch of extra data that they don't want. For those clients, using the form of the API scoped to a specific type upfront would seem like a good safeguard.

As to whether to have the type be a named parameter vs. a parameter in an options map, good question. Seems like it's going to be common enough for clients to want to filter for a specific metric that having it be a named parameter initially might be the best way to go.

So I suppose overall, I'd lean toward going with what you've already done on the PR for this.

rlinehan commented 8 years ago

Even though we only support 1 type at this point, I like the idea of having it possible up front to filter for only the id. Might be annoying for clients that just are interested in the data for the 1 type to be using a function that suddenly starts returning a bunch of extra data that they don't want. For those clients, using the form of the API scoped to a specific type upfront would seem like a good safeguard.

As to whether to have the type be a named parameter vs. a parameter in an options map, good question. Seems like it's going to be common enough for clients to want to filter for a specific metric that having it be a named parameter initially might be the best way to go.

I was thinking that even when we add another type, we would continue having the methods that don't include the specific type as a parameter return only the :bytes-read/:full-response, for backward compatibility, and then add not only the new type but also an ALL option for filtering. I worry that since we aren't completely sure how the API will look as we add more things (such as an option for returning everything under a metric-id, rather than just the explicit metric-id), keeping the methods that take the metric type will make expanding the API more difficult.

camlow325 commented 8 years ago

I can't remember if we discussed this previously or not but did we decide that we wanted to have the timer objects returned back through the Clojure get-client-metrics API to be Java objects - like as opposed to Clojure maps? I had been thinking we would want a layer that translates those for the benefit of those who would want to consume the metrics in a format that looks more native to Clojure? Looks like the get-client-metrics-data API result consists of primitive Clojure objects all the way down.

rlinehan commented 8 years ago

The get-client-metrics methods return Java ClientTimer objects. The get-client-metrics-data methods return Clojure maps of data (and the Java version of that - getClientMetricsData returns instances of the Java class ClientMetricsData) . Does that answer your question/provide what you were asking about?

camlow325 commented 8 years ago

I was thinking that even when we add another type, we would continue having the methods that don't include the specific type as a parameter return only the :bytes-read/:full-response, for backward compatibility, and then add not only the new type but also an ALL option for filtering.

I guess it would be more intuitive to me as an API consumer to have the version that you call with no argument for filtering just return everything it knows about - rather than designating a separate ALL option later for that purpose.

I worry that since we aren't completely sure how the API will look as we add more things (such as an option for returning everything under a metric-id, rather than just the explicit metric-id), keeping the methods that take the metric type will make expanding the API more difficult.

Yeah, good point. A bit hard to anticipate that at this point. I suppose I'd be okay with moving it into an options map in anticipation of that if it makes sense to you.

camlow325 commented 8 years ago

The get-client-metrics methods return Java ClientTimer objects.

Yeah, that's what I'm seeing. I was just questioning if that's what we want.

I guess there's a lot of stuff inherited from a com.codahale.metrics.Timer object that would be tedious to coerce into a Clojure map. It wasn't obvious to me that clients of this API would want/need to get the full Java form of the object back in order to be able to do anything with it other than read data - i.e., not to call .stop or .close on it. I'm good with it as it is. It just surprised me a bit to see that form of the object come back when calling from the Clojure REPL.

rlinehan commented 8 years ago

I guess there's a lot of stuff inherited from a com.codahale.metrics.Timer object that would be tedious to coerce into a Clojure map. It wasn't obvious to me that clients of this API would want/need to get the full Java form of the object back in order to be able to do anything with it other than read data - i.e., not to call .stop or .close on it. I'm good with it as it is. It just surprised me a bit to see that form of the object come back when calling from the Clojure REPL.

When I was first discussing this API with @cprice404 we settled on having two methods/APIs - one that returned a map of some of the most useful data (same as what the tk-comidi-metrics returns https://github.com/puppetlabs/trapperkeeper-comidi-metrics/blob/master/src/puppetlabs/metrics/http.clj#L31) and one that returned the Timer instance, which users could then use as they needed.

camlow325 commented 8 years ago

When I was first discussing this API with @cprice404 we settled on having two methods/APIs - one that returned a map of some of the most useful data (same as what the tk-comidi-metrics returns https://github.com/puppetlabs/trapperkeeper-comidi-metrics/blob/master/src/puppetlabs/metrics/http.clj#L31) and one that returned the Timer instance, which users could then use as they needed.

Fair enough. Thanks for the explanation.

camlow325 commented 8 years ago

Done reviewing this for now. Overall, I'm still :+1: on the direction.

I suppose the biggest question about the shape of the API is whether returning lists of flat metric maps for the get-client-metrics-data-* calls is sufficient or if we need to index them further, e.g., by url. We'll probably want to have @cprice404 make the final call on that.

I'm curious to hear what you think about the .getOrAddTimer synchronization questions.

As for remaining naming questions and how to pass in the metric type parameter on the get-client-metrics-* function, I'll defer to your judgment.

I really like the general shape of the get-client-metrics-* APIs as being separate from the clients themselves and pretty lightweight in terms of what you have to do to construct a query. @cprice404 will presumably have much stronger opinions on the exact nature of how things are aggregated in the query responses than I do.

Looking good to me! Can't wait to get this stuff up into Puppet Server!

rlinehan commented 8 years ago

@camlow325 I've pushed up several commits that update the getOrAddTimer implementation, rename bytes-read to full-response, and respond to a couple other bits of PR feedback.

I'll leave the other questions about nil, versions of the get- functions that explicitly take metric type, and anything else there still might be discussion about til Chris gets back.

camlow325 commented 8 years ago

:+1: to the latest changes. Sounds like we're waiting for feedback from @cprice404 at this point on some of the remaining API design questions.

cprice404 commented 8 years ago

Attempt to roll up some responses to all of the non-threaded comments:

I'm good with the new name suggestions
Definitely good with getting rid of that synchronized block
Agree with @camlow325 that I'd prefer we throw rather than return a nil if someone asks for metrics when there was no registry
I am in favor of removing the arities that take the metric type and then surfacing that in a backward-compatible way sometime in the future. When we get to that point, I'd vote for an options map so that we don't end up in a situation where we have to juggle additional arities in the future.
Slight preference for switching the strippedURL thing to call the URI constructor rather than concatenating strings.
I'm good with the different sigs of get-client-metrics-* so that you can get either the data or the actual Timer objects.
I'm good with just returning vectors/seqs of the metrics rather than trying to put another layer of nested maps in. The concerns that @rlinehan raised seem compelling to me, and I imagine that most consumers would just be calling val on those in most cases anyway.

Unrelated to any of the previous comments:

We are definitely going to need a way to ask for only the "leaf node" metrics in the metrics-id code paths. For displaying data in a status endpoint, we almost certainly only want the leaves. That doesn't have to come in this PR as long as we have an idea of how the API will need to change to support that before we do a release. But I think it has to happen before we can expect to be able to consume this easily for our status endpoints.
I think it'd probably be best to use more specific schemas for the metrics-data for the different types; one for metric-id, one for url, and one for url-and-method or whatever. Having the nil values in there reminds me of running into similar stuff like this when trying to deal with polymorphic objects in OO land in the past, and my gut is that having nil-able fields is probably not the best solution. We can discuss this further when we pair tomorrow, hopefully if we decide to change it it won't be too big of a change.
We are returning ClientMetricData from public APIs, but that class is defined in the impl namespace, which somewhat implies that it is private and that we can change it in the future. We should probably change that in some way, and the decision may end up involving the same questions about polymorphism from my previous comment. Hopefully that won't be a big deal either, though, and we can poke at it tomorrow as well.

I think that's all from my end!

cprice404 commented 8 years ago

For posterity: @rlinehan and I discussed all of the bullet points from my previous comment IRL yesterday and I think we are on the same page on all of them now.

cprice404 commented 8 years ago

Reviewed the last few commits that @rlinehan added; I'm officially +1 now. @camlow325 if you get a chance to take a peek at this last slew of changes, I think we're ready to push the button!

finally

camlow325 commented 8 years ago

:+1: I really like the recent changes which add in polymorphism / reduce conditional logic for the Filter and Timer classes.

@rlinehan - I can merge this as soon as you fix up the one nit with importing "metrics.*" from CategoryClientTimerMetricFilter.java.

rlinehan commented 8 years ago

@camlow325 I've fixed up the imports

camlow325 commented 8 years ago

:+1: :100: :accept: :cake: :dancer: :dancers: :package: :rabbit: :rabbit2: :racehorse: :sake: :tada: :v:

puppetlabs / clj-http-client

(TK-316) Improved client metrics support #52