New FAQ section in wiki

felixbarny commented 8 years ago

It currently contains the questions 'Will my application go down if Elasticsearch is unreachable or slow?' and 'How can get started with stagemonitor with minimal risk and overhead and which settings should I choose for production, QA and development?'. I hope that there is something new and interesting for you in the answers.

rlau89 commented 7 years ago

Can you tell me if its possible, in a multi-tenant environment, to monitor performance stats per tenant?

This is a Java Struts 2 application running on Tomcat where each action performed the user and tenant organisation would be known.

felixbarny commented 7 years ago

Which features of stagemonitor are you most interested in? It would be pretty simple to add an additional tenant tag to each request trace. It would however be considerably harder to add the tenant to the timers which are the base for the grafana metrics dashboards. Depending on the number of tenants, this would also imply a explosion of dimensions for the metrics which results in very high storage requirements. A bit of background: There is a timer for each unique "Business Transaction" like "Do Search" etc. Let's say you have 100 business transactions and 1000 tenants. That would result in 100*1000=100,000 timers being created. Each timer also holds ~15 individual metrics (see "The Outcome" chapter in my blog I wrote for elastic: https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store).

Good news is that you can also have performance metrics based on Traces/Spans. If you have a high traffic site, you however most likely will have to sample i.e. only store traces for 10% or even 1% of incoming requests.

Let me know if I can clarify some stuff or if I should elaborate more.

rlau89 commented 7 years ago

Thanks for getting back to me, its much appreciated. The main results we would like to achieve from this is to have performance stats for things such as Java heap space & CPU utilisation, Ajax requests, HTTP requests, OS memory, OS CPU and disk space.

I'm a bit confused as to why there would need to be the extrapolated timers for the transactions. Would it not be possible to have the minimum number of timers possible and when the timer executes, at this point it stores the tenant and necessary data? Would the transactions always be ones which hit the DB? In this case there would be a single class which eventually handles the call.

Apologies if I'm misunderstanding I'm just having difficulty grasping how this would fit for us. Any further assistance you can provide would be much appreciated. Thanks

felixbarny commented 7 years ago

The important thing to understand is that the timers don't not store individual events but traces store individual events. So no matter how many requests hit your server, the timers report on a regular basis, say once each 10 seconds. So the timer aggregates the execution time of all requests in memory, computes statistics (avg, std, and percentiles) and flushes them periodically. That's why there is one timer per business transaction.

Java heap space & CPU utilisation, Ajax requests, HTTP requests, OS memory, OS CPU and disk space.

Most of the metrics like heap space, CPU utilisation, memory and disk space are independent of the tenant.

To get tenant specific performance data, I'd suggest adding a custom tag to the traces. The api for that is currently changing, so here is the old and the new way of doing that:

stagemonitor 0.31.0

RequestMonitor.get().getRequestTrace().addCustomProperty("tenant", tenant);

master

TracingPlugin.getCurrentSpan().setTag("tenant", tenant);

Then you can modify the request analysis dashboard so that you can filter by the tenant.

rlau89 commented 7 years ago

Ok thank your help, I will hopefully be able to incorporate this into the application.

stagemonitor / stagemonitor-mailinglist

New FAQ section in wiki #21