Closed alronlam closed 2 years ago
@joshuacortez Just a clarification with the intended use case in mind, to get a clearer picture of how it'll be used.
Sample scenario: I have a gdf containing: grid_x, grid_y, geometry, urban_class
If I want to get urban clusters, I would do something like this (roughly based on the reference implementation):
urban_gdf = gdf[gdf["urban_class"] == "urban"]
urban_gdf_with_cluster_ids = cluster_tiles(urban_gdf)
gdf = # join logic here to place the cluster IDs back to the original GDF
Is this right?
For consistency with the other GeoWrangler functions, was thinking of something like this instead:
connectable_tiles = gdf[gdf["urban_class"] == "urban"]
gdf = cluster_tiles(gdf, connectable_tiles)
# gdf would then have a new column called "cluster_id" where each urban row has its corresponding cluster ID, while the the others (e.g. rural, sub-urban, etc) have NaNs.
connectable_tiles
param still keeps it flexible for any kind of logic you have on which tiles can be connected if adjacent (say, maybe you want urban and sub-urban). What do you think?
Yep sounds good!
The function inputs can look like this
def cluster_tiles(
gdf: gpd.GeoDataFrame,
grid_x_col = "x",
grid_y_col = "y",
category_col: Optional[str] = None,
categories_used: List[str] = None,
connectivity_type: str = "four-way"
) -> gpd.GeoDataFrame:
The output is the same as the original gdf but with an appended cluster_id
(string but can be NULL)
tile_id, x, y, < other cols >, cluster_id
Addressed in PR here #178
This is from @joshuacortez:
In geospatial projects that involve scoring grid tiles, sometimes apart from the individual tiles, we’re also interested in the clusters of contiguous tiles that share the same attribute.
For example, we have tiles classified as urban or rural and want to find the urban clusters.
Input: Geodataframe (grid_x, grid_y) Output: Geodataframe (grid_x, grid_y, cluster_id)
Important Considerations: