Closed pnorman closed 2 years ago
EBS, if using GP3, is generally great. If you're looking for inexpensive high IOPS, cheapest is to make a bunch of small GP3 volumes as each has a 3000 IOPS baseline (and RAID 0/LVM them or whatever). ST1 gives consistent hard drive like performance but latency is a bit higher than locally attached hard drives. If mod_tile is using blocking IO for reads, and I imagine mod_tile is, you may find you need fewer Apache threads/processes to get the same request throughput with GP3.
m6a/c6a/r6a isn't always a savings over m6i/c6i/r6i due to performance differences in memory. I've seen the Xeon chips work out significantly cheaper than Epyc in some situations. I'd benchmark both if the Gravitons don't work out.
Be prepared for your instance to fail. It just happens. Most instances will stay up for years, other will have hardware issues. Sometimes you'll get a warning in the Events of the EC2 console (and sent to email) where you'll have a few weeks to stop and start the instance. In other cases the recovery process will start the instance on new hardware. Any ephemeral stored data would of course be gone, so that's a big negative for using locally attached storage beyond a cache.
Just some thoughts from someone who has been using EC2 for over a decade.
@MarkRose Thank you for the helpful insights.
ST1 gives consistent hard drive like performance but latency is a bit higher than locally attached hard drives.
Our sustained IOPS is 10k-20k, with peaks of 50k, so st1 isn't an option. My inclination is to start with a single maxed out gp3 and if necessary, split the tiles into their own volume.
The big unknown to me is latency, not iops. I don't know how that's going to impact performance.
Solving this will also solve #637
Depends on #660?
Solving this will also solve https://github.com/openstreetmap/operations/issues/637
We're looking at replacing pyrene independent of this.
Depends on #660?
No, although they have some common parts for changing our account management
Account has been created. Accessible via assumed role from master account.
Decisions:
AWS Region: us-west-2
That's in Oregon probably very near the existing OSUOSL servers. Can I suggest us-east-2 (Ohio) or us-east-1 (Virginia) instead?
Instance choice for initial experiment: m6gd.16xlarge Instance Store (local NVMe)
The reasoning for us-west-2 was carbon neutrality, but https://sustainability.aboutamazon.com/environment/the-cloud?energyType=true says us-east-1 and us-east-2 are 95% powered by renewables too. Initial choice for AWS region: us-east-2.
AWS Region: us-east-2 Elastic IP: 3.144.0.72 Instance Name: palulukon Instance Type: m6gd.16xlarge
Initial basic AWS billing Budget created. $1000/month. Alerts me, ops and @grischard
DNS records created for palulukon.openstreetmap.org
Base chef is done, we're adding in arm64 for prometheus exporters.
Import is now running... Thank you to @pnorman
AWS credits cannot be used to buy Savings Plans or Reserved Instances (Partial Upfront or All Upfront). It looks like Reserved Instances No-Upfront are allowed, but that would leave OSMF exposed for potentially the last 2 months of the 12 month minimum reserved period (12 month is minimum period offered for this instance type). Reserved Instance pricing is available here.
EC2 + Bandwidth costs are currently around $115 per weekday which is sufficiently covered by the credits which expire on 30 September 2023 and allowing some headroom for bandwidth increase.
A remaining cost saving option (to allow more capacity) is to move to Spot Instances (~70% lower instance cost), but this would require additional DevOps investment to turn the "pet server" into "cattle", which is best handled by a separate ticket.
Ref #637
Outstanding questions