nsidc / qgreenland

Source code for generating the QGreenland package hosted at https://qgreenland.org/
https://qgreenland.readthedocs.io
Other
36 stars 9 forks source link

Document/report subsetting removing features unexpectedly from shapefiles #604

Open trey-stafford opened 1 year ago

trey-stafford commented 1 year ago

Migrated from Jira:

While working the hydrologic sub-basins layer:

Using earthpy.clip, we saw the NE Ocean layer clip away North America when subsetting, but not Russia. We switched to ogr2ogr as the subset program to solve this issue, and now we see some hydrologic sub-basins disappearing during subset, even though none of them should have been affected by the subset parameters.

We added a makevalid step before the normal ogr2ogr operation, and this fixed up problems with missing polygons for PROMICE, but did not fix the tectonic plates layer.

We think this needs to be looked at before v3.

trey-stafford commented 1 year ago

We have a tectonic plate boundaries layer (polylines) in "Geology". I think the problem we had with "tectonic plates" was related to the polygon layer, probably available in the same dataset.

MattF-NSIDC commented 1 year ago

NOTE: Hydrologic sub-basins were removed in v0.31.0

trey-stafford commented 1 year ago

PR for adding the tectonic plates polygon layer without any unexpectedly missing features: https://github.com/nsidc/qgreenland/pull/675

MattF-NSIDC commented 1 year ago

@trey-stafford we need to summarize what we learned exploring that dataset in this ticket. My brain isn't in the right place for this, just hoping one of us remembers next week :)

trey-stafford commented 1 year ago

In the case of #675, we found that Antarctica was consistently included in the output despite clipping the dataset to QGreenland's background boundary (40 degrees North) unless we clipped with geographic coordinates or (in the case of the PR) excluded Antarctica via SQL. We think this has something to do with reprojection causing verticies of Antarctica to be outside of the project projection's valid coordinate space (the edge where the polygon meets the dateline extends into infinity).

In experimenting with various incantations of ogr2ogr, we did manage to replicate the 'missing features' phenomenon but we did not record what those were, and I remember us being pretty confused as to why it was occuring. I think one case was when we expanded our clip boundary outward.

trey-stafford commented 1 year ago

I looked at the Hydrologic Sub-basins of Greenland, Version 1 data. It's already in a polar stereo projection and only for Greenland, so I'm not sure why we would have been subsetting this dataset.

In any case, I downloaded the Greenland_Hydrologic_Sub_Basins.rar, extracted it's contents, and then transformed the dataset to EPSG:3413 while subsetting using our data boundary:

ogr2ogr -t_srs EPSG:3413 -clipdst qgreenland/assets/greenland_rectangle.geojson -nlt MULTIPOLYGON test.gpkg Greenland_Hydrologic_Sub_Basins.shp

And - no missing features.

MattF-NSIDC commented 1 year ago

I think we should consider resolving this issue with an entry in a contributor "troubleshooting" document that describes some specific idiosyncrasies we've observed and warn about sometimes unexpected behavior.

We may also want to open an issue against GDAL/OGR with the specific case in which increasing the subset bounds to a superset of the previous bounds results in fewer features being selected. I can't think of any explanation for that behavior other than a bug. Before doing this, we could also try the latest ogr2ogr release.