Open cholmes opened 11 months ago
Would also be great to add the one that doesn't split by s2 to see how the performance compares. https://beta.source.coop/vida/google-microsoft-open-buildings/geoparquet/by_country/
I think one challenge here is that our (VIDA) S2 cells are based upon the number of rows within a file, and are therefore not at a fixed level. This makes it difficult to determine the S2 cell id at a single level that can then be queried across all files.
The VIDA dataset on source combines google and microsoft buildings, and should get the most buildings of the different options. It should be relatively easy to add, but it doesn't use 'quadkey' for spatial partitioning, it's s2 instead. The one to add is https://beta.source.coop/vida/google-microsoft-open-buildings/geoparquet/by_country_s2 - as it's more partitioned and likely will perform much better (though it's worth trying both).
The main task for this is to have a different 'spatial' column - the current set up assumes quadkey, as that's what the first two were done with. Ideally download_buildings function would take an argument that would next be 'quadkey' or 's2', and we could add h3, geohash, etc. The
get_building
CLI should just have an option to use this dataset, and then it can pass the right arguments into download_buildings.The quadkey is computed client side, and it's likely similarly easy to compute the s2 key, and then use that in the query.