weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
521 stars 176 forks source link

Polygon preprocess is broken with an existing shapefile #811

Closed bw4sz closed 1 month ago

bw4sz commented 1 month ago

there was too much added code in https://github.com/weecology/DeepForest/commit/4b70b23502aefceddc5c9a871847b6fadc99b185 it makes no sense to remake the geometry column after it was made, leading to strange interactions with geometry column for polygons.

gdf = gpd.read_file("/Users/benweinstein/Downloads/crown_delineation_shapefile.shp")
gdf =  gdf[gdf.geometry.type=="Polygon"]
gdf["image_path"] = "Orthomosaic_WGS84_UTM20S.tif"
gdf["label"] = "Tree"
gdf["source"] = "Araujo et al. 2020"
df = read_file(gdf, root_dir="/Users/benweinstein/Downloads/")
df.root_dir = "/Users/benweinstein/Downloads/"

df = df[["geometry", "image_path", "label", "source"]]
split_files = split_raster(df, path_to_raster="/Users/benweinstein/Downloads/Orthomosaic_WGS84_UTM20S.tif", root_dir="/Users/benweinstein/Downloads/",
                           base_dir="/Users/benweinstein/Downloads/crops/", patch_size=2000, patch_overlap=0)

In #766, we added https://github.com/weecology/DeepForest/commit/4b70b23502aefceddc5c9a871847b6fadc99b185#diff-44ced633bf62b484ff15fc8f381dd9981ff00e19d3facfc9c1bbecce9963e320R298, but this is duplicitious, the geometry column is already made https://github.com/weecology/DeepForest/blob/7e0aa4394b50beecc0584fcc62511a8aa7d75e21/deepforest/preprocess.py#L92.

Even just looking at it, dropping the column, just to remake it. Its not obvious what I was thinking. This has a weird side-effect that the behavior of a geodataframe with "geometry" versus "polygon" column name has different behavior. The user will never be able to figure this out.