Open bhousel opened 3 years ago
Is there any way to liaise with the people who create the weekly planet file and extracts, and see if they are able to process the planet data on the server and create the collections files? This would remove the need for the planet data to be downloaded by the NSI project, saving a lot of bandwidth - but I don't know if they would have the spare server capacity to process the data for our needs?
I feel like the people generating the planet already have their hands full, and I know the planet generation process takes a long time.
One good thing about containerizing this process with Docker (#4 / #5) is that we can run it anywhere (local machines, servers) and not need to worry about coordination with anyone else, or making sure the environment is set up how we need it.
A GitHub action can run for up to 72 hours: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits
Although they've only got 14 GB of free space: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
I assume the PBF wouldn't survive direct chopping into say 10GB blocks and play nicely?
However various people: http://download.openstreetmap.fr/extracts/ Or the smaller anonymised ones: https://download.geofabrik.de/ And https://wiki.openstreetmap.org/wiki/Planet.osm#Country_and_area_extracts offer partial extracts.
Could we run the different continents/countries individually and just merge the resulting JSON, or is that more complicated? Then it could run at whatever frequency we wanted (and possibly even differently for different countries).
Unfortunately I can't imagine this working within the constraints of a GitHub action - the 14GB space limit is a nonstarter. I'm ok with spending a few bucks to have it run somewhere, and having it take a few hours a week without intervention.
I think the biggest constraint for me right now is my free time - even spending a few hours on making enhancements to the nsi-collector or debugging failures is just too much time. We set up this project here because the osmium
binary dependency has gotten really hard to work with, and doesn't sit well with the rest of name-suggestion-index which is purely JavaScript.
I'm ok with spending a few bucks to have it run somewhere, and having it take a few hours a week without intervention.
I see GitHub have added large runners under the GitHub Actions tab, with "Sizes up to: 64-cores · 256 GB RAM · 2040 GB SSD Storage", but I'm not sure on the costs. They aren't available for free, but if you said you'd be okay with spending a few bucks setting something up, perhaps a large runner might be cost effective compared to what you may have already considered?
Considering setting up one of my VPSs to download and process the planet weekly.