opengeos / open-buildings

Tools for working with open building datasets
https://opengeos.github.io/open-buildings
Other
119 stars 17 forks source link

Performance testing of different partition options #17

Open cholmes opened 10 months ago

cholmes commented 10 months ago

In making the get-buildings command I went through a couple of iterations of trying out different formatting - definitely realizing that more row groups than gpq makes by default is better. And with the latest scripts I have a way to set the 'max number of rows' per file and also the number of row groups. But I have no idea if things could be lots faster if we increased or decreased row group size, and/or increased / decreased number of files. The 'defaults' I used were max 10 million rows per file and 20000 rows per group. It'd be great to try out some variations on that. And ideally experiment on the tradeoffs between 'legibility for download' (like use country then admin level 1 like the google buildings data does) vs 'balance of spatial size' (like use the quadkey max size algorithm entirely, instead of country then quadkey, so we'd have much fewer files over all, but each file would be meaningless to users - they'd need to use the 'tool' to download).

The performance I was getting to was 20-30 seconds to download a small number of buildings. But it was just a handful of tests.

Ideally we'd have a command that would run a 'benchmark' that would have 20-30 locations globally and get the performance for each of them and report that out, so we can easily compare how tweaks to the data work.