microsoft / GlobalMLBuildingFootprints

Worldwide building footprints derived from satellite imagery
Other
1.42k stars 206 forks source link

Geometries are not valid as per SQL #32

Open rbrundritt opened 2 years ago

rbrundritt commented 2 years ago

I imported the Ukraine building data into SQL as Geography objects and found that they were not valid. I simply had to run an update on all geometry and call MakeValid to correct this (took about 4 hours). It would be great if the geometries could be made valid in this repo so that those who are consuming the data don't need to do this extra step and do this heavy compute (think of it as a sustainability thing).

andwoi commented 2 years ago

What feature changes are required for SQL to recognize geometries as valid? What SQL database are you using? Will the change work for all?

rbrundritt commented 2 years ago

I used this data with SQL 2019, 2022, and SQL Azure. All of them found the data to include invalid geometries. I'm certain this would be an issue with the geography type of all versions of SQL 2008+, and a fix for one would apply to all. I suspect PostGIS would find the same. As per the OGC Simple Feature specification and SQL specification used by MS SQL and PostGIS, a valid polygon must meet the following requirements:

GeoJSON specification has similar validation requirements: https://www.rfc-editor.org/rfc/rfc7946#section-3.1.6

With the exception of the ring orientation requirement, the specification for valid polygons between spatial databases (MS SQL/PostGIS) and GeoJSON are identical.

The OGC doesn't dictate a requirement for ring orientation, however MS SQL and PostGIS align with the following for geography objects:

Here is a good example of the above

In older versions of SQL, ring orientation must be such that a polygon represents a half a hemisphere (smaller area) and not the rest of the globe. If this isn't the case, an error is thrown.

Going through the data I didn't notice any issue with ring orientation (I didn't look that close into this part), but that would be something worth validating as part of your data pipeline. In the case of building footprints, all the polygons should be small (definitely less than half the globes surface area), so an easy check is to calculate area using a geography object accurate library, and then reorient the coordinates if needed.

Note that SQL and PostGIS both have built in methods to fix invalid polygons, calculate geographic area, and reorient cooridnates. The SQL spatial functionality is available as a .NET library you can use without a database. I've often used it to make end user created data valid before passing it to the database for an intersection test).

Here are some additional related resources:

https://learn.microsoft.com/en-us/sql/relational-databases/spatial/polygon

https://www.postgis.net/workshops/postgis-intro/validity.html

rbrundritt commented 2 years ago

Just to call it out in case it got lost in my previous post, some of the geometries in this dataset are invalid by GeoJSON specification as well. I had even seen noticed this once when I took a subset of one of the datasets and noticed several polygon rings where not closed (start/end of a ring should have the same coordinate).