openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
98 stars 13 forks source link

Allocate hardware for minutely vector tiles #1073

Closed pnorman closed 4 months ago

pnorman commented 5 months ago

We need to allocate hardware for minutely vector tile generated by tilekiln.

Tilekiln decouples tile generation and serving, but both can easily be done from the same machine. The current estimate for disk space is 89GB for flat nodes, 607GB for osm2pgsql DB, and 135GB for tile storage. When a new version of shortbread comes out we will want to continue serving the old tiles without updating them for some time, so that's another 135GB. This is a total of 1943GB in 5 years with 15% growth and assumes we can load DB updates on a new machine. If the OSMF goes ahead with offering a second tileset showing more OSM data I estimate that it would add 5% to the osm2pgsql db and 125% to the tile storage for a total of 2682GB.

This will easily fit on a 3.84TB SSD and in the short term will fit on a 1.92TB SSD. A 3.84TB SSD has more than enough room to reload the database without dropping the old one right now, which adds flexibility until we have redundant machines.

The main resource required is CPU power for pre-generation of tiles. When a new tileset is needed (e.g. a new shortbread version) the system will generate all tiles. This takes a lot of CPU but works very well in parallel. A general-purpose machine should be plenty for this - my estimates are it would take 1-2 days on one of our Gen10 machines.

Since the OWG prepared a minimal budget and still was asked to cut back, we may have to ask the board for extra budget. We have a general-purpose server for new services in the budget, but I think cases like the Discourse move being blocked by wordpress (#1070) show that we still need contingency.

Long term we want multiple servers for redundancy, but that can wait until we have at least one set up.

tomhughes commented 4 months ago

The plan was to use rhaegal as the test bed which I think should be up to the job. I know @hbogner has retrieved the machine now but I'm not sure where he is with getting it back up...

pnorman commented 4 months ago

I'm not sure if the rhaegal plan was for initial testing, which I've completed, or the first OSMF deployment. In either case given the ongoing saga with rhaegal since we made that decision, we should reevaluate it.

hbogner commented 4 months ago

rhaegal was on on cleanup last week and stress test this week. I'll install clean debian 12 on it this week, and hope to ship it to where viserion was during the conference next week .

pnorman commented 4 months ago

A full z0-z14 rendering takes about 60 hours using 56 threads on faffy. Basically all of that is spent on z13 and z14. A modern general-purpose machine would take about half the time.

This is on hold until funding decisions are made.

Firefishy commented 3 months ago

rhaegal is ready and available if needed. It could be a long term home.