open-forest-observatory / tree-detection-framework

BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Predictions output file format and naming #43

Open youngdjn opened 2 weeks ago

youngdjn commented 2 weeks ago

Currently, RegionDetections.save saves an ESRI shapefile of name name.shp (plus ancillary files named name.xyz) into a folder named name, where name is provided by the user via the argument save_path, or the entrypoint arg --predictions-save-path.

It would be better to default to .gpkg output file format (see here). Also, I think it would be best for the output filename to be the same as the input raster used for predictions (probably with a timestamp suffix, so multiple preds can be made without overwriting). This way, if they user specifies to perform predictions for multiple rasters (i.e. the input folder contains multiple rasters), predictions from each one can be saved and associated with their source.

It looks like this could be accomplished simply by changing save_path to a file path instead of a folder path, and specifically of the format path/to/preds/folder/{input_rastername}_preds_{timestamp}.gpkg. I'm open to input on any of this, especially the inclusion of the timestamp.

russelldj commented 2 weeks ago

I agree that shp is a bad format to use and it wasn't my goal to use it initially. It looks like if you the path you provide to geopandas.save_file doesn't have an extension, it interprets it as shp. I think there's a couple documentation changes that could make this a little better. The first is already done, since @amrithasp02 clarified the output file should be a geofile here. I can also update the documentation of RegionDetections.save (here) to be a little more explicit that a file path with a geospatial extension is requested.

I see the reasoning for per-raster outputs, but that was not the way I initially thought of this working. I was thinking we'd provide a folder of rasters only if they were overlapping or otherwise needed to be processed together. In this case, I don't think we'd necessarily want per-raster outputs, and for the overlapping case, this might not even be clearly defined. But I can see an alternative workflow where you just want to conveniently run on a folder of independent rasters.

If we do want to do some sort of default naming, I think the generate_predictions function would be a good place for that logic.

youngdjn commented 2 weeks ago

Oh thanks, I think I misunderstood a few things before, but everything you say makes sense and I don't think there's a need for any change, except maybe the documentation change you suggest (so I won't close this issue yet).