Open bw4sz opened 2 years ago
Hey @bw4sz Do you think upscaling/downscaling the input resolution to something that better matches a default patch_size a good way to deal with such an issue? We can explore all sorts of algorithms for this.
I am open to all ideas. In general I am worried about manipulating input data, its computationally heavy (these are going to be huge tiles) and we could lose detail when downscaling. But i'm open to testing it. My first idea was to manipulate the patch size argument and leave the input data as is. All options are open.
On Tue, Mar 29, 2022 at 9:47 AM Kush Kothari @.***> wrote:
Hey @bw4sz https://github.com/bw4sz Do you think upscaling/downscaling the input resolution to something that better matches a default patch_size a good way to deal with such an issue? We can explore all sorts of algorithms for this.
— Reply to this email directly, view it on GitHub https://github.com/weecology/DeepForest/issues/318#issuecomment-1082117101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLAVGCJSI3BJCZ5AIFLVCMXYLANCNFSM5R7AFGAA . You are receiving this because you were mentioned.Message ID: @.***>
-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/
I understand the concern about manipulation of the input data.
Actually, the idea I had originally had, was estimating the tree size as a blob. For example, we may need to downscale a tree x times, before it becomes a blob of width 10pixels. From then we can figure out the patch_size as a possible function of x. Higher resolution data will have a greater x, because it will need more downscales to get to a blob of specific width as compared to lower resolution data.
Blob-width detection can be done through some image preprocessing steps. Once we are able to figure out a function from x to patch_size, we'll apply this patch_size to the original data (not the downscaled version).
In simpler words, we first downscale the input to fit an ideal patch size. Then vary the patch size based on the number of downscales we had to do in the previous step and then apply this patch_size on the original image.
I'm not sure if I'm over complicating this. 😅 So your thoughts on this will be appreciated! I'll go through the docs to get a bit more idea of the inner workings of the project before trying out any ideas we can think of. 👍🏻
here is my comment without any concrete solution...
I had to work with different resolutions and found out a simple resolution
argument may not be sufficient. I am using map images from web map server. Even there is well defined resolution for specific "zoom" level, the image quality can vary a lot due to their different original raw image sources. (i.e. the input images have been already down/up sampled). At the moment I use preprocess.split_raster
to preprocess training data empirically and use different patch sizes to do inference (and the results will be aggregated). The overall performance is acceptable.
@dassaniansh let's use this issue. The first thing we need is the evaluation score (https://deepforest.readthedocs.io/en/latest/Evaluation.html) as a function of input resolution.
Python 3.9.6 (default, Jul 30 2021, 09:31:09)
[Clang 10.0.0 ]
Type "help", "copyright", "credits" or "license" for more information.
from deepforest import get_data
import os
from deepforest.main import deepforest
m = deepforest()
Reading config file: deepforest_config.yml
m.use_release()
Model from DeepForest release https://github.com/weecology/DeepForest/releases/tag/1.0.0 was already downloaded. Loading model from file.
Loading pre-built model: https://github.com/weecology/DeepForest/releases/tag/1.0.0
csv_file = get_data("OSBS_029.csv")
root_dir = os.path.dirname(csv_file)
> results = m.evaluate(csv_file, root_dir, iou_threshold = 0.4, savedir=None)
/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
results
{'results': prediction_id truth_id IoU ... true_label image_path match
0 31 0 0.000000 ... Tree OSBS_029.tif False
1 41 1 0.000000 ... Tree OSBS_029.tif False
2 17 2 0.564232 ... Tree OSBS_029.tif True
3 50 3 0.543922 ... Tree OSBS_029.tif True
4 34 4 0.616394 ... Tree OSBS_029.tif True
.. ... ... ... ... ... ... ...
56 26 56 0.685911 ... Tree OSBS_029.tif True
57 42 57 0.575741 ... Tree OSBS_029.tif True
58 None 58 0.000000 ... Tree OSBS_029.tif False
59 10 59 0.692665 ... Tree OSBS_029.tif True
60 1 60 0.763430 ... Tree OSBS_029.tif True
[61 rows x 12 columns], 'box_precision': 0.8035714285714286, 'box_recall': 0.7377049180327869, 'class_recall': label recall precision size
0 0 1.0 1.0 56, 'predictions': xmin ymin xmax ... label score image_path
0 330.080566 342.662140 373.715454 ... 0 0.802979 OSBS_029.tif
1 216.171249 206.591583 248.594864 ... 0 0.778803 OSBS_029.tif
2 325.359222 44.049034 363.431244 ... 0 0.751573 OSBS_029.tif
3 261.008606 238.633163 296.410034 ... 0 0.748605 OSBS_029.tif
4 173.029999 0.000000 229.023438 ... 0 0.738209 OSBS_029.tif
5 258.342041 198.233337 291.543762 ... 0 0.716250 OSBS_029.tif
6 97.654602 305.077118 152.689178 ... 0 0.711664 OSBS_029.tif
7 52.430378 72.021301 85.009918 ... 0 0.698782 OSBS_029.tif
8 292.347534 368.635132 332.700928 ... 0 0.688486 OSBS_029.tif
9 249.411453 51.019691 277.099640 ... 0 0.688165 OSBS_029.tif
10 317.631165 181.155960 345.687683 ... 0 0.686540 OSBS_029.tif
11 18.473246 346.017670 57.073792 ... 0 0.668805 OSBS_029.tif
12 272.603821 330.232361 305.601562 ... 0 0.666738 OSBS_029.tif
13 277.753357 0.110479 311.371582 ... 0 0.654231 OSBS_029.tif
14 195.076935 339.480408 227.222992 ... 0 0.648725 OSBS_029.tif
15 190.892273 78.937210 244.395508 ... 0 0.629515 OSBS_029.tif
16 0.000000 144.165573 42.506039 ... 0 0.622943 OSBS_029.tif
17 184.994812 256.272949 229.686768 ... 0 0.604757 OSBS_029.tif
18 382.137787 264.451416 400.000000 ... 0 0.600438 OSBS_029.tif
19 234.207275 296.622803 273.882538 ... 0 0.589491 OSBS_029.tif
20 289.422516 83.104446 332.196808 ... 0 0.574947 OSBS_029.tif
21 178.419952 372.066101 206.742188 ... 0 0.552992 OSBS_029.tif
22 327.421021 119.720184 358.950806 ... 0 0.551969 OSBS_029.tif
23 0.000000 48.725574 30.629400 ... 0 0.546445 OSBS_029.tif
24 374.312622 207.728210 398.147827 ... 0 0.538519 OSBS_029.tif
25 159.874435 167.144257 197.542709 ... 0 0.536578 OSBS_029.tif
26 58.809822 294.708130 101.116890 ... 0 0.534906 OSBS_029.tif
27 40.161354 0.914158 94.033958 ... 0 0.529695 OSBS_029.tif
28 3.540020 216.683243 40.567917 ... 0 0.527856 OSBS_029.tif
29 286.517975 137.959244 316.439972 ... 0 0.526694 OSBS_029.tif
30 369.212402 19.478354 398.912109 ... 0 0.524993 OSBS_029.tif
31 228.237549 388.862610 252.511566 ... 0 0.523766 OSBS_029.tif
32 104.303833 181.964188 155.852936 ... 0 0.523676 OSBS_029.tif
33 55.202187 190.014359 89.678932 ... 0 0.502996 OSBS_029.tif
34 312.022125 7.093416 344.556549 ... 0 0.502465 OSBS_029.tif
35 366.731781 119.042236 387.905792 ... 0 0.495438 OSBS_029.tif
36 118.516029 68.083710 147.080994 ... 0 0.492526 OSBS_029.tif
37 75.934868 144.205017 114.146507 ... 0 0.490224 OSBS_029.tif
38 145.504059 386.502502 176.084259 ... 0 0.487398 OSBS_029.tif
39 0.000000 258.224457 21.422482 ... 0 0.462236 OSBS_029.tif
40 102.027664 30.461601 130.979263 ... 0 0.418635 OSBS_029.tif
41 390.224884 375.093231 399.788422 ... 0 0.415022 OSBS_029.tif
42 88.018456 365.364807 107.558815 ... 0 0.403836 OSBS_029.tif
43 231.916046 243.187439 252.987671 ... 0 0.402279 OSBS_029.tif
44 388.845673 175.683319 399.906464 ... 0 0.397244 OSBS_029.tif
45 101.755402 124.721375 121.393097 ... 0 0.396860 OSBS_029.tif
46 384.908203 321.104553 400.000000 ... 0 0.389809 OSBS_029.tif
47 72.870079 251.158737 115.134605 ... 0 0.384928 OSBS_029.tif
48 130.386810 99.750061 183.451813 ... 0 0.369765 OSBS_029.tif
49 381.477173 75.818222 400.000000 ... 0 0.355139 OSBS_029.tif
50 373.786987 0.000000 399.417480 ... 0 0.350143 OSBS_029.tif
51 0.102808 200.861526 8.157425 ... 0 0.334018 OSBS_029.tif
52 312.861847 334.460052 337.424225 ... 0 0.323812 OSBS_029.tif
53 331.021149 88.350540 355.052887 ... 0 0.317281 OSBS_029.tif
54 0.317305 375.103668 8.618253 ... 0 0.315891 OSBS_029.tif
55 162.206650 264.200470 188.070602 ... 0 0.306989 OSBS_029.tif
[56 rows x 7 columns]}
results["box_recall"]
0.7377049180327869
results["box_precision"]
0.8035714285714286
Long term we will want to try curriculum learning/cross-training across different spatial resolutions.
Worth exploring slicing inference. https://colab.research.google.com/github/obss/sahi/blob/main/demo/inference_for_torchvision.ipynb
Overall the models are too sensitive to input resolution. There are two strategies to improve, 1) the preprocessing, 2) the model weights.
Preprocessing
As the resolution of the input data decreases compared to the data used to train the model, the patch size needs to be larger.
Can we come up with a function that describes how the patch size argument in predict_tile should respond to data of different resolutions?
I played with this notebook here: https://github.com/weecology/DeepForest_demos/blob/master/street_tree/StreetTrees.ipynb
and briefly mention it here:
https://deepforest.readthedocs.io/en/latest/better.html#check-patch-size
I am imagining that predict_tile should have a resolution argument.
instead of the patch_size argument, it would have a resolution argument.
Alternatively, we could write a tutorial or procedure to automate the finding of the correct patch size.
Model training
We can retrain the model using different zoom augmentations to try to make it more robust to input patch size. This is a larger effort and takes more machine learning knowledge. I have played with the bird model and we have the annotations associated with these data
https://www.biorxiv.org/content/10.1101/2021.08.05.455311v2
There are data to play with here from the birds
https://zenodo.org/record/5033174#.YkM2LBDMKHE
The hayes dataset is very high resolution, the drone was very close to the birds, and the everglades dataset is lower resolution.