Closed iant01 closed 6 years ago
created PR16 changes needed to master.yaml to add transportation-systems service and service.yaml file to define task definition and load balancer listener rule for service.
Right now, the following items have arbitrarily set values: Host: staging-2018.civicpdx.org Path: /transportation-systems Port:3000 Priority: 40 (needs to be before the civic-2018 service and the civic-lab service) Memory: 2048 2GB (last years service was a memory pig, hopefully this years will be less, setting high to start)
@iant01 How much memory did we use last year? And how do you measure it? Is there some way we can test this locally before deploying?
Can possible use docker stat on a running host to get the memory info on the running containers. Either a container developer would need to run the command on their local system or on another ECS instance. Since we can't ssh into the hacko's container instance to run the command we might be able to run the transportation-systems container on another ECS instance, but I have not had any success running the 2017 container in my AWS account, so may not have success with the 2018 container. I will give it a try.
There may be a docker API that might work to the hacko ECS instance, but again we might need an access key to get in.
This is the API containers, right? If those look like this year's API images from the backend-examplar, either there's an AWS way to monitor their usage or we'd need console access to the Docker host. :-(
@znmeb , is there any chance of running the container locally, performing a few operations through the API (to load up some in-memory data) and running the docker stat
command as Ian suggested above?
There is no way we're going to throw 1/4 of our available memory at a new container "just in case" - this was only done last year as a last-minute, last-resort fix, and no one's had time to go back and characterize that pig since then.
Yeah, I can spin it up locally but this isn't the full API. Should I just use the Docker host default settings for container resource usage?
It would be really nice if we could build resource limiting into the images - interpreted languages like Python tend to take up all the RAM they can find even if they're sharing it with a dozen other containers / VMs they don't know about.
I'm confused - why isn't the Docker image you'd spin up locally not "the full API"? Isn't that one of the benefits of Docker, so that the app you run locally and the one you deploy into production are identical?
It's the full API for the one database we had when we built the image. We have more data now, which will mean more models and more API endpoints and probably more RAM used.
So we have some options to profile python and django behavior, including running DEBUG true with the gunicorn server (-p) connecting with aws db, some usage of the django DEBUG toolbar (not currently installed) or maybe through new relic if we need some more advance info not provided by docker stat
That said there were some complexities to the transportaation project last year that didn't exist when I left a bit ago, will catch up this weekend but not sure if this will be an issue.
Good data on usage is awesome and great though.
Let's not go overboard here - the most significant information we'll need to know is roughly how much RAM the Django app(s) in the container will consume, so that we can allocate a sufficient amount of RAM in the AWS CloudFormation template for this container. We generally start out with 100MB and bump it by increments of 100 from there, and spent a lot of time last year debugging containers that wouldn't stay running because we had no idea what kind of memory load they would have.
However, we're not just going to throw RAM at these - this isn't an unlimited resource - so if there's some risk that they'll need more than 100MB, let's get a rough number based on some rough characterization. Thanks!
By the way, wouldn't DEBUG=True use more RAM?
Silly question... was the transportation container last year running a database rather than connecting to a database server or was it a hybrid of both (keeping large amounts of data local after grabbing it from a remote DB server?
Yes DEBUG would use more RAM.
But here is where I've got to:
docker stats
gives a streaming output, so a PIT of memory usage and a few other stats:
CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
I then ran this on the docker container transportation-system-backend_api_production_1
on my host machine. Using the current api
During startup of container using the prod flag ./bin/start.sh -p
and connecting to the aws hosted db, we see CPU % maxing out around 85%, with the memory usage going to around 152 MiBs.
The thing I was seeing though is that the MEM USAGE
did not seem to drop by more then a few MiBs, a few queries using filters on the crash data, I made it to a ~225MiBs. So started looking into what this figure actually included.
First, I found Google's cAdvisor (https://github.com/google/cadvisor). This provides a GUI and provides 60 seconds of historical data, so a bit more useful then docker stat
.
Looking into the MEM usage came across this issue, which documents what the different types of memory are being recorded:
https://github.com/google/cadvisor/issues/638
tldr is:
Hot
is the working set - pages that has been recently touched as calculated by the kernel.
Total
includes hot + cold memory - where cold are the pages that have not been touched in a while and can be reclaimed if there was global memory pressure.
or another way:
Total (memory.usage_in_bytes) = rss + cache Working set = Total - inactive (not recently accessed memory = inactive_anon + inactive_file)
So question becomes which is most important number?
@iant01 I feel like there was some type of hybrid data store going on, but was not directly on project last year and was not completely sure of the full magic that was happening.
Awesome data Brian, thank you.
When we allocate memory to each container, there’s no memory management to worry about - as in, the “cold” memory that could be reclaimed probably wouldn’t be, because there’s nothing else in the container that would appreciably request contended memory (it’d all be consumed by one process - gunicorn, Python, whatever the runtime host is).
So given we’re doing hard allocations per container, I’m going to conservatively assume that we should use the Total - and then round up to the nearest 100 (just to give us a little breathing room for edge cases and future API enhancements).
Based on this data, I’m inclined to allocate 300 MB to this transportation-systems container.
I've got the merged database ready for testing - I'm planning to build a local development environment from it at the May 20 build session so we can see what we have.
All of the discussion on memory use should be moved to its own new issue, this issue was intended for creation of the service task to get things going in ECS.
This issue can be closed once all the Memory discussion is in its own issue and PR 16 has been merged.
On the issue of which memory size is relevant it would be the Total memory size.
Create the service sub directory and service.yaml file for use in getting the service task definition into ECS.