microsoft / GlobalMLBuildingFootprints

Worldwide building footprints derived from satellite imagery
Other
1.33k stars 198 forks source link

quadkey join index to link issue #91

Open justinelliotmeyers opened 5 months ago

justinelliotmeyers commented 5 months ago

Screenshot 2024-02-07 092328 Screenshot 2024-02-07 092604

Hello, I am trying to join the latest building footprints to the building coverage quadkey index. The ids don't align. Is there a way to join them? Any insight is greatly appreciated! Thanks, Justin

tomalrussell commented 5 months ago

Hi Justin - not an official answer, but it looks like the tables have quadkeys at different resolutions (see quadkey docs for explanation)

As I read it, 301001331 (nine digits) is the "grandparent" quadkey of e.g. 30100133130 (eleven digits), so I would expect you can truncate the longer keys to match against the shorter (coarser resolution / larger tile) ones.

justinelliotmeyers commented 5 months ago

thanks @tomalrussell appreciate your idea!!!

This loosely worked, however the dataset-links has multiple of the same quadkey id. I didn't go down the rabbit hole of why, but perhaps overlapping/ disputed admins from a quick and dirty spot check: image

dataset-links.csv has a row count of 28540 in the dissolve of buildings-coverage.geojson there are 36,770 dissolved from left 9 of larger 248,877.

so a few back and forth joins will yield the correct 1-1 relationships, flush out duplicates, and missings from both datasets.

While i dont need a full quadkey poly fabric across the globe, it would be helpful. Identifying what tiles have >1 is also something that would be helpful.

So a few weird gaps making an easy build (in my head) a bit more complex.

Hopefully someone from bing can add more insight and help me keep this simple if possible.

andwoi commented 5 months ago

@justinelliotmeyers we don't update the building coverage with each update which is why you see the discrepancy. You can generate your own coverage doing something like

from shapely import geometry, wkt
import mercantile
tile_id = "1202112212"
quadkey_box = geometry.box(*mercantile.bounds(mercantile.quadkey_to_tile(tile_id)))
wkt.dumps(quadkey_box)

and that will give you a wkt you can plot. In this way, you can create whatever aggregate levels you'd like. Hope that helps.