Open MikeTheCanuck opened 6 years ago
Both kinds of monitoring (capacity planning and marketing) are subjects near and dear to my heart. ;-)
I suspect the capacity planning is adequately covered by the AWS tools - people wouldn't deploy there if they didn't have the tools to manage and plan capacity. I know a lot about capacity planning, but nearly all of it is useful only in a bare metal environment, not in containers running inside virtual machines. ;-)
On the marketing side, pretty much everyone I know starts with the free tier of Google Analytics. There are other ways to do it but Google Analytics is one that everyone understands.
I think the first step, and the way to avoid the black hole, is to gather measurements to characterize our performance. Too often performance tuning or optimization is requested without truly understanding the load on each facet of the system. If we can setup some kind of benchmark or measurement of each of the items you listed, I think we could better understanding where our bottlenecks are, why they are there, and which pieces actually need (or don't) scaling of additional resources. With empirical data we can make informed decisions about where to spend time or money to improve the performance of our system, and justify doing so.
The website usage is a perfect example, if we don't know what the characteristics of it's access are, we have no idea whether we even have a scaling problem.
I would advocate that the first step would be researching the various ways we could gather usage and performance data for each of the components you listed, and then implementing those to gather measurements of the performance of each piece. This way we don't have to don't have to follow up and investigate time optimizing or scaling any one piece until we know which pieces are not performing adequately
It sounds like Ed may have some insight or experience in this area that we could leverage, if we can figure out how to apply it to dockerized containers. Or at least the services we run inside them.
On Sun, Jun 24, 2018, 1:27 PM M. Edward (Ed) Borasky < notifications@github.com> wrote:
Both kinds of monitoring (capacity planning and marketing) are subjects near and dear to my heart. ;-)
1.
I suspect the capacity planning is adequately covered by the AWS tools
people wouldn't deploy there if they didn't have the tools to manage and plan capacity. I know a lot about capacity planning, but nearly all of it is useful only in a bare metal environment, not in containers running inside virtual machines. ;-) 2.
On the marketing side, pretty much everyone I know starts with the free tier of Google Analytics. There are other ways to do it but Google Analytics is one that everyone understands.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hackoregon/civic-devops/issues/196#issuecomment-399785189, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvqFND4Lg4xcBxiobMaLD3-3UfStdk7ks5t__Y4gaJpZM4U1OsI .
I would start with the usage monitoring - Google Analytics for the front end and whatever logging Django does by default on API usage. But this requires a conversation inside the organization, not just an issue discussion on GitHub.
You mean because it would cost $, or why?
There are a number of questions we need to answer, to be able to make adjustments to our infrastructure and ensure that - come the day that we start receiving non-trivial traffic levels - the systems are able to provide an adequate experience (reasonable response times, not a lot of queued requests).
Asking questions about "how much traffic" and "what kinds of performance" are we seeing, can often lead to a bloated effort to swallow, digest and synthesize All The Data:
I am not interested in falling into that black hole.
What I'm interested in is identifying pieces of our architecture that need an upgrade.
So here's what I'm most immediately worried about:
/data
or root volumes, as developers upload more data - we've experienced multiple occasions when the database disk has been filled so completely that incoming requests have no place offload in-memory data to service particularly complex requests.There's also curiosity about how many "hits" each site (and each API service) receives per day, so that we have some idea whether these sites & services could require additional enhancements, and if there's a reason for us to look into performance issues for possible scale-up/scale-out. This "curiosity"-driven work is a lower priority from an engineering/DoS-protection point of view, but may be of more interest from a marketing point of view.