Context: When spatial joining 2 geodataframes with polygons, we can have duplicate matches. One way this has been addressed in the past was to convert one of the gdfs to points by getting the centroid. That can work but some polygons can be unmatched because it depends heavily on the position of the centroid.
A more comprehensive approach (but also more computationally expensive) is to join polygons based on the highest intersection.
Inputs: 2 gdfs, and columns for the unique ID per gdf, and proj_crs (not sure if we can leave this out entirely)
Output: gdf matched IDs.
Reference implementation (this still has to be refactored and generalized):
This is from @joshuacortez:
Context: When spatial joining 2 geodataframes with polygons, we can have duplicate matches. One way this has been addressed in the past was to convert one of the gdfs to points by getting the centroid. That can work but some polygons can be unmatched because it depends heavily on the position of the centroid.
A more comprehensive approach (but also more computationally expensive) is to join polygons based on the highest intersection.
Inputs: 2 gdfs, and columns for the unique ID per gdf, and proj_crs (not sure if we can leave this out entirely) Output: gdf matched IDs.
Reference implementation (this still has to be refactored and generalized):