weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
510 stars 172 forks source link

Multiple patch size ensembling following by non-max suppression to make predict_tile more resilient. #395

Open bw4sz opened 1 year ago

bw4sz commented 1 year ago

DeepForest is very sensitive to the patch size argument, especially when predicting in resolution data above or below the 10cm data used to train the baseline model.

https://deepforest.readthedocs.io/en/latest/better.html#check-patch-size

During prediction we can do non-max suppression on multiple scales to try to be more robust to the patch-size sensitivity. I am imagining a flag in predict_image and predict_tile that allows multi-patch prediction. Perhaps if the user inputs a list of patch sizes?

tile = model.predict_tile(
"/Users/ben/Desktop/test.jpg",
return_plot=True,patch_overlap=0,
iou_threshold=0.05,
patch_size=[400, 600, 800])

and then checks for a list here. If there is a list, iterate over patch sizes.

https://github.com/weecology/DeepForest/blob/046bc26e1cae0664b6bdfe1315486c7167d320d5/deepforest/predict.py#L198

Or better yet if there is a chain together rasterio windows iterators? And then just iterate over the combined set of windows. maybe -> https://stackoverflow.com/questions/3211041/how-to-join-two-generators-or-other-iterables-in-python

Currently predict_tile reads in an tile, split_raster into pieces and predict each, concats the dataframe and performs non-max suppression. It would be simple to run this on a number of patch sizes in an automated way and then perform non-max suppression on a set of predictions. The downside would be increased runtime. Its not clear to me if we should also try this for resampling, or just patch size? Probably start just with patch size.

bw4sz commented 1 year ago

From a colleague at ETH Zurich.

"https://github.com/qubvel/ttach this is what I've used for semantic segmentation - possibly it has some wrapper for torchvision's object detector output format, but yeah I think the solution most people implement is predict on a bunch of augmented images, concat the results and run one big NMS. I think this is what Detectron2 does internally: https://detectron2.readthedocs.io/en/latest/_modules/detectron2/modeling/test_time_augmentation.html see GeneralizedRCNNWithTTA._batch_inference (edited)"