Closed MikeTheCanuck closed 6 years ago
Paging @iant01 - could use some advice on most of the above decisions:
Amazon Linux 2 LTS Candidate AMI 2017.12.0 (HVM), SSD Volume Type - ami-7f43f307
because it's "tuned for EC2 performance and security" and because selecting an Amazon AMI seems to give us more chance of supportability longer-term than some Community AMIWhat is AMI? CentOS / RHEL? Ubuntu 16.04LTS? It really matters because everything we've done in Transportation so far has been on Debian in containers. It's not a tough job to port to Ubuntu. Going to CentOS / RHEL is going to be more challenging.
If this machine is just going to be running PostgreSQL, what's the cost difference for this versus a dedicated PostgreSQL server like RDS?
Ultimately we want to do what's easiest for operations - is one big PostgreSQL server the best case for that?
As to sizing, for those teams that have databases built, I can help plan that. If we host the full TriMet congestion database, that's 30 GB right there, and that doesn't leave room for working space. We can always add virtual disks and tablespaces, however. That's a tricky manual process but I've done it and it's well-documented.
Ok, first we need to answer some questions to determine an instance size (at least to start with) 1) during development does it need to be the same size as what may be needed in production (I would say no if concerned with cost) 2) unlike GCP where the memory and cpu can be scaled, do you want to pay for unused cpu and network performance to have the memory size to have the entire table space in memory (either in development or production) 3) does postgre require all table space to be in memory? (Most relational databases have tuning parameters to indicate how much memory to allocate to table and quiery space. 4) if for dev and even production, an instantanious quiery responce is not a requirement (and not looking at thousands or even hundreds of quieries a second) go for something smaller that just exceeds the needed current requirements andscale up the instance later. 5) don't confuse on disk size of database with size of root disk needed for the instance, the database itself should be on a separate EBS volume, not on the root disk.
Also, AMI decision is somewhst separate from instance size (type) decision (there are some cross over items to consider such as root size of AMI since a smaller instance may not have a large enough root disk to host it, also root disk type EBS vs. Instance)
Based on what Ed has indicated, currently debian is being used and there may be work involved to get things moved/running on a different flavor of linux (umbutu, RHEL/Suse, AWS linux 2). Also Ed indicated PostgreSQL 10 and PostGIS 2.4, I need to look to see if there are and AWS AMIs that have this. Community AMI can have risks involved and ones from the market place cost money. AWS RDS postgre is 9.6.x and postgis is 2.3 as of this week, people are reporting they can spin up rds with postgres 10.1 successfully, but AWS has not officially indicated anything above 9.6.x is available/supported and no indication if postgis 2.4 is there yet.
Given this, I suspect short term we will need to roll our own AMI with the version of PostgreSQL/PostGIS We can get by on (ie. Not the bleeding edge version 10.2, 2.4., but 10.1, 2.4.0 or 2.3.x) or a lower current AWS supported versions. Also build it off a current base AWS AMI that has a supported linux version/brand we can work with.
As we get closer to demo, all this may and probably will change as AWS moves forward with thier supported AMIs
So just going to leave this though out there, maybe too much overhead for the moment if having an initial development database is a blocking issue for many teams
.https://github.com/hackoregon/hackoregon-aws-infrastructure
https://github.com/hackoregon/backend-service-pattern/blob/master/.travis.yml https://github.com/hackoregon/backend-service-pattern/blob/master/bin/push https://github.com/hackoregon/backend-service-pattern/blob/master/bin/ecs-deploy.sh
Questions I am leading to is:
Once the base AMI is figured out for compatibility (we did run the current flavor of PostGIS on emergency-response and I think other teams if that helps as a starting point), could we handle further configuration of the actual environment through a similar model as the api, ie: a dockerized github repo, deployed to travis, to a preconfigured service within the cloudformation template?
We have the benefits of:
oops, forgot references, will edit my last comment in a minute with links
Thanks @BrianHGrant - this seems like a good thing to consider for the future. At the moment, standing up a central database to get all five project running is more important than standing up a reproducible database.
(The design of the ECS, Docker, Travis infrastructure of the AWS API layer is definitely still in scope this year, but we are not precluding a different design for our AWS data layer, which is more "durable infrastructure, iterative data loading" than "iterative infrastructure and iterative code loading" that the API and front-end layers follow.)
We are not interested in adding the Docker complexity to the AWS data layer at this time - the benefits to our central infrastructure are unclear, and won't help us one bit if we decide to migrate to RDS (to reduce our operations overhead, for example). We have enough moving parts to worry about already, and taking the incremental improvement of one DB server over the proliferation of bespoke DB servers last year is my primary goal at this time (alongside "getting a database up and running for each 2018 project").
We plan to explore writing as many (bash) scripts as needed to support migration of data from S3 to the central AWS data layer, so that this is less "how the fsck did we do that?" each time we touch data. Any help in constructing those would be welcome.
To be fair, I don't know that we absolutely need PostgreSQL 10 / PostGIS 2.4 - I just build on those because they're what's currently on the desktop. So my fundamental question is whether the costing for an RDS instance is more than if we use the same sized CPU, RAM and magnetic / SSD disks and spin it up ourselves.
If the numbers pencil out for RDS over rolling our own, I'll do the "downgrade" testing to make things work with 9.6 / 2.3.
@iant01 Once we spin up an RDBMS, upgrading is not a viable option. If we spin up 9.6 / 2.3 we are committed to that through demo day!
IMHO this is a classic build vs. buy decision - are the long-term costs with an RDS instance lower than the long-term costs of Hack Oregon custom code and operations for a roll-our-own? My bias is towards buy, but that's my corporate IT background.
Josh Berkus did a paper on this a few years back. RDS turned out to be very expensive but Josh has been a PostgreSQL developer for a long long time. What's easy for him would not necessarily be easy for us in production.
@znmeb yes, RDS is (apples-to-apples) more costly than running psql in our own EC2 machine. How much more is a question, as costs keep changing as Amazon balances between maximizing profits and keeping up with the competition.
We are in the unenviable position of both having "minimize budget" as a requirement and the reality that part-time volunteers are a thin line of defense for keeping up with cloud operations. There is no good choice, only slightly-better-of-the-bad choices.
That said, we've got this - additional debate on this isn't necessary here. What our development teams need to focus on is their core infrastructure requirements, and how they will be able to deliver an installable data package that we can consume into a PostgreSQL-compatible service layer. Sounds like pgdump
is your weapon of choice Ed, so we should be able to wrangle that no matter what postgres implementation we choose.
And it looks like Amazon's package infrastructure only support postgres through 9.6, so that's what we'll be deploying.
@MikeTheCanuck Sounds good
pg_restore
, having the input dump file on somebody's laptop or workstation is a non-starter - it takes hours to get our 6 GB dump of the congestion database up to "the cloud" on Comcast's puny 6 megabit per second uploads. How do we get the .backup
files / directories up to AWS S3 faster?A note on PostgreSQL capacity planning:
For most of our projects, the major limiting factor is going to be disk space, not RAM or cores. PostgreSQL's default settings rarely need to be changed, and you'll usually get a warning that tells you what's happening and how to fix it. And most modern app stacks like Django use the RDBMS as a dumb persistence engine, not as a place where application logic lives.
Well, folks, as you may have heard, it turns out that binary RPM packages for PostGIS are not available in the repositories that ship on Amazon Linux 2 as of 2018-02-24. I've posted a question in their support forum asking when PostGIS will be there: https://forums.aws.amazon.com/message.jspa?messageID=832523#832523.
I can build PostGIS and its numerous dependencies from source, but it will take some time, and it will be a support and deployment nightmare. So I'm officially putting one of my opposable thumbs down on Amazon Linux 2. :-(
Thanks for the update @znmeb. Yes it looks like the Amazon repositories (both the default and amazon-linux-extras
) don't include PostGIS packages.
However, I've pulled together some fragments of info that look like a potential avenue - that others have succeeded in the past adding an "Amazon Linux AMI" repository link to their machine and then installing the PostGIS packages. See the Amazon Linux AMI 2015.03 - x86_64
link here, which others have configured in an Amazon Linux machine like so:
sudo rpm -Uvh https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-redhat96-9.6-3.noarch.rpm
This results in the following output, which I think we can work with:
$sudo yum list | grep postgis
postgis22_96.x86_64 2.2.6-2.rhel6 pgdg96
postgis22_96-client.x86_64 2.2.6-2.rhel6 pgdg96
postgis22_96-debuginfo.x86_64 2.2.6-1.rhel6 pgdg96
postgis22_96-devel.x86_64 2.2.6-2.rhel6 pgdg96
postgis22_96-docs.x86_64 2.2.6-2.rhel6 pgdg96
postgis22_96-utils.x86_64 2.2.6-2.rhel6 pgdg96
postgis23_96.x86_64 2.3.6-2.rhel6 pgdg96
postgis23_96-client.x86_64 2.3.6-2.rhel6 pgdg96
postgis23_96-debuginfo.x86_64 2.3.4-1.rhel6 pgdg96
postgis23_96-devel.x86_64 2.3.6-2.rhel6 pgdg96
postgis23_96-docs.x86_64 2.3.6-2.rhel6 pgdg96
postgis23_96-utils.x86_64 2.3.6-2.rhel6 pgdg96
postgis24_96.x86_64 2.4.3-1.rhel6 pgdg96
postgis24_96-client.x86_64 2.4.3-1.rhel6 pgdg96
postgis24_96-debuginfo.x86_64 2.4.1-1.rhel6 pgdg96
postgis24_96-devel.x86_64 2.4.3-1.rhel6 pgdg96
postgis24_96-docs.x86_64 2.4.3-1.rhel6 pgdg96
postgis24_96-utils.x86_64 2.4.3-1.rhel6 pgdg96
postgis2_96.x86_64 2.3.2-1.rhel6 pgdg96
postgis2_96-client.x86_64 2.3.2-1.rhel6 pgdg96
postgis2_96-debuginfo.x86_64 2.3.2-1.rhel6 pgdg96
postgis2_96-devel.x86_64 2.3.2-1.rhel6 pgdg96
postgis2_96-docs.x86_64 2.3.2-1.rhel6 pgdg96
postgis2_96-utils.x86_64 2.3.2-1.rhel6 pgdg96
If this works, we'll still follow the Amazon Linux playbook as much as possible, only diverting where absolutely necessary (and when it works).
That-s the PGDG repo - the standard repo where the PostgreSQL and PostGIS put their RHEL-compatible RPMs. They're compiled for RHEL6. Not RHEL7/CentOS 7. And not Amazon Linux 2. Compatible libraries may or may not exist in Amazon Linux 2.
What you'll have to do is set up a build server running Amazon Linux and download the source RPMs, then recompile them. And there are dependencies from EPEL - you'll need those too. This is a risky path. I've been there and I've got the scars. This is the vendor's job - Amazon's - not the customer's - Hack Oregon.
VM instance type:
Based on the budget guidance I've been hearing so far, we're going to start out with a t2.large and scale up as demand increases. Scaling up requires only momentary downtime - shut down the instance, change the instance type, and start the instance, which generally takes less than a minute.
Network selection:
vpc-b841bedf | public database
and vpc-a38acdc4 | hacko-integration
Elastic IP allocation:
Which implies that as a result of standing up an EC2 instance, it's automatically allocated an Internet-accessible IP address. So let's assume this point is moot.
EBS Volume for database storage:
Thus, we have to predict the ultimate size of our data and pre-allocate that as an EBS volume up front. Or risk having to migrate the database server and its data later to another EC2 instance.
(Theoretically we ought to have robust data backup & restore procedures, and be able to migrate to a new server at any time - making this a trivial risk. That said, this is neither something we have extensive experience with, nor the time to plan out ahead of standing up our PostgreSQL server.)
As a wild-ass guess of how much data volume we'll need in development this season, it seems like 500GB should more than cover our development needs - how could we possibly see data grow beyond that? [tongue planted firmly in cheek]
Check with @BrianHGrant to be sure but I think we are dropping our large (4 GB raw, 30 GB of PostgreSQL tablespace) dataset, if that's included in your 85 GB.
And even if you can't grow a volume, as long as you can add new volumes you can put PostgreSQL databases on them via tablespaces.
There are standard operating procedures for migrating PostgreSQL databases to bigger boxes as well. If you want to go that route let me go book hunting.
We've got a strategy in place to work with this:
Once Ian's script drops and is tested, we'll be able to go live with the database server.
As of last night, the EC2 VM hosting the development PostgreSQL instance is live, and one project team has their databases and credentials available.
Issues #60 and #61 address the two outstanding concerns, so this item can now be closed.
ToDo List - EC2 Instance
This issue fulfils part of #18.