lightblue-platform / lightblue-rest

Data access layer as service REST API
GNU General Public License v3.0
9 stars 16 forks source link

Expose JVM and REST endpoints metrics to JMX using Command objects #297

Closed svkm102 closed 7 years ago

svkm102 commented 7 years ago

Added metric timer and exception tracking code to all command objects.This allows more fine grained tracking than the filter based tracking approach included in other PR. The metrics reported to JMX follows this namespace pattern :-

api.operationName.entityName.entityVersion.requests.active (active requests count, to indicate streaming requests or requests taking long time to complete) api.operationName.entityName.entityVersion.requests.completed (timer metrics for completed requests) api.operationName.entityName.entityVersion.requests.exception.exceptionClass (exception information/ exception class)

operationName = {find, update, bulk, delete, ... } entityName = {user, country, ...} entityVersion = {1.0.0, ...} exceptionClass = {NullPointerException, ArrayIndexOutOfBoundException, ...}

@ahenning, @dhaynes, @sameer31 , @macgregor please review.

alechenninger commented 7 years ago

Woops, in my example I missed operation in the entity request. It's not to be taken 100% literally, so if you find other adjustments you must make, feel free. So for example it would probably really be Context startEntityRequest(String operation, String entity, String version);.

svkm102 commented 7 years ago

Thanks Alec, encapsulating all dropwizard dependencies in a single unit does make managing and testing all things metrics a lot cleaner.

I tried the interface based approach, but wasn't able to get it to work the way we wanted mainly because of the fact that dropwizard counters, timers and meters couldn't be managed effectively in an interface (static and final, constants only thing) and needed some sort of declaration/updating in individual command objects or the base class, which defeated the entire "encapsulating all things dropwizard in a single unit paradigm" that we are shooting for and made things more complicated than they should be.

Instead I tried with having a base class (or base abstract class) which encapsulates all metrics related logic, and can be extended by AbstractRestCommand to provide all metrics related functionality to each command object transitively. This makes things a lot cleaner and managing dropwizard encapsulation easier. Including metrics in existing/future command object is also easy and seamless this way as we don't have to add/inject any dependency.

I don't think there will be any specific requirement to have the methods to start, end and mark exception requests dependent on command object types. So not sure if having separate implementation of startSavedEntity(...), startFindEntity(..) would be of help much. Using common methods to manage metrics across command object makes more sense to me, the only difference being the namespace in which each command object will report metrics/exception.

Let me know your thoughts on this approach.

alechenninger commented 7 years ago

I think this is definitely better because the logic is encapsulated to mostly one class, but I still don't like the base class approach. Not a huge fan of class hierarchies, and I think this is a clear case where we can favor composition over inheritance. I'll open a PR against your branch to demonstrate some small refactorings that use what you've done, just compositionally.

So not sure if having separate implementation of startSavedEntity(...), startFindEntity(..) would be of help much

I agree actually. I think I didn't communicate clearly what I was suggesting. For find/insert/update/save, those can all use the same method like you're doing now, which is perfect. But there are some APIs that are fundamentally different, like locking, which doesn't work with entities and versions, but instead domains and resource ids. Similarly, we may want to treat streaming commands differently to see if the request latency is different than non streaming versions, for science! :-) Saved requests we could also consider monitoring by the saved request name, which would give us specific results for those saved queries. Of course, I'm not sure anyone is actually using saved requests.

Thanks so much for the updates, I'll open a PR because I think that will be easier to communicate next round of comments. If you want to do some peer programming, I'd be happy to do that too.

svkm102 commented 7 years ago

Thanks for the PR Alec. It did cleared some things that I misinterpreted from your comments initially. I have expanded upon you code and this PR is updated with tested version. I guess it would be a good idea to move metrics related things to core to get more fine grained control on bulk requests. I am working on that refactoring, but if that turns out to be more complex than envisaged, I guess we can proceed with this PR for now.

svkm102 commented 7 years ago

Split PR's for lightblue-core, lightblue-rest and lightblue-audit-hook

alechenninger commented 7 years ago

Since the other PRs look like the way to go, I'll close this one now.