weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
514 stars 173 forks source link

Model should be less sensitive to resolution #318

Open bw4sz opened 2 years ago

bw4sz commented 2 years ago

Overall the models are too sensitive to input resolution. There are two strategies to improve, 1) the preprocessing, 2) the model weights.

Preprocessing

As the resolution of the input data decreases compared to the data used to train the model, the patch size needs to be larger.

Can we come up with a function that describes how the patch size argument in predict_tile should respond to data of different resolutions?

I played with this notebook here: https://github.com/weecology/DeepForest_demos/blob/master/street_tree/StreetTrees.ipynb

and briefly mention it here:

https://deepforest.readthedocs.io/en/latest/better.html#check-patch-size

I am imagining that predict_tile should have a resolution argument.

instead of the patch_size argument, it would have a resolution argument.

raster_path = get_data("OSBS_029.tif")
# Window size of 300px with an overlap of 25% among windows for this small tile.
predicted_raster = model.predict_tile(raster_path, return_plot = True, patch_size=300,patch_overlap=0.25)
raster_path = get_data("OSBS_029.tif")
predicted_raster = model.predict_tile(raster_path, return_plot = True, resolution=0.5, patch_overlap=0.25)

Alternatively, we could write a tutorial or procedure to automate the finding of the correct patch size.

Model training

We can retrain the model using different zoom augmentations to try to make it more robust to input patch size. This is a larger effort and takes more machine learning knowledge. I have played with the bird model and we have the annotations associated with these data

https://www.biorxiv.org/content/10.1101/2021.08.05.455311v2

There are data to play with here from the birds

https://zenodo.org/record/5033174#.YkM2LBDMKHE

The hayes dataset is very high resolution, the drone was very close to the birds, and the everglades dataset is lower resolution.

kkothari2001 commented 2 years ago

Hey @bw4sz Do you think upscaling/downscaling the input resolution to something that better matches a default patch_size a good way to deal with such an issue? We can explore all sorts of algorithms for this.

bw4sz commented 2 years ago

I am open to all ideas. In general I am worried about manipulating input data, its computationally heavy (these are going to be huge tiles) and we could lose detail when downscaling. But i'm open to testing it. My first idea was to manipulate the patch size argument and leave the input data as is. All options are open.

On Tue, Mar 29, 2022 at 9:47 AM Kush Kothari @.***> wrote:

Hey @bw4sz https://github.com/bw4sz Do you think upscaling/downscaling the input resolution to something that better matches a default patch_size a good way to deal with such an issue? We can explore all sorts of algorithms for this.

— Reply to this email directly, view it on GitHub https://github.com/weecology/DeepForest/issues/318#issuecomment-1082117101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLAVGCJSI3BJCZ5AIFLVCMXYLANCNFSM5R7AFGAA . You are receiving this because you were mentioned.Message ID: @.***>

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

kkothari2001 commented 2 years ago

I understand the concern about manipulation of the input data.

Actually, the idea I had originally had, was estimating the tree size as a blob. For example, we may need to downscale a tree x times, before it becomes a blob of width 10pixels. From then we can figure out the patch_size as a possible function of x. Higher resolution data will have a greater x, because it will need more downscales to get to a blob of specific width as compared to lower resolution data.

Blob-width detection can be done through some image preprocessing steps. Once we are able to figure out a function from x to patch_size, we'll apply this patch_size to the original data (not the downscaled version).

In simpler words, we first downscale the input to fit an ideal patch size. Then vary the patch size based on the number of downscales we had to do in the previous step and then apply this patch_size on the original image.

I'm not sure if I'm over complicating this. 😅 So your thoughts on this will be appreciated! I'll go through the docs to get a bit more idea of the inner workings of the project before trying out any ideas we can think of. 👍🏻

easz commented 2 years ago

here is my comment without any concrete solution...

I had to work with different resolutions and found out a simple resolution argument may not be sufficient. I am using map images from web map server. Even there is well defined resolution for specific "zoom" level, the image quality can vary a lot due to their different original raw image sources. (i.e. the input images have been already down/up sampled). At the moment I use preprocess.split_raster to preprocess training data empirically and use different patch sizes to do inference (and the results will be aggregated). The overall performance is acceptable.

bw4sz commented 2 years ago

@dassaniansh let's use this issue. The first thing we need is the evaluation score (https://deepforest.readthedocs.io/en/latest/Evaluation.html) as a function of input resolution.

  1. Download a set of evaluation data.
  2. Resample in python/qgis (https://rasterio.readthedocs.io/en/latest/topics/resampling.html). The input res is 10cm, so probably something like 15cm, 20cm, 30cm, 40cm, 50cm, 60cm.
  3. For each evaluation set take the baseline deepforest crown model and run the evaluation code.
Python 3.9.6 (default, Jul 30 2021, 09:31:09) 
[Clang 10.0.0 ]
Type "help", "copyright", "credits" or "license" for more information.
from deepforest import get_data
import os
from deepforest.main import deepforest
m = deepforest()
Reading config file: deepforest_config.yml
m.use_release()
Model from DeepForest release https://github.com/weecology/DeepForest/releases/tag/1.0.0 was already downloaded. Loading model from file.
Loading pre-built model: https://github.com/weecology/DeepForest/releases/tag/1.0.0
csv_file = get_data("OSBS_029.csv")
root_dir = os.path.dirname(csv_file)

> results = m.evaluate(csv_file, root_dir, iou_threshold = 0.4, savedir=None)

/Users/benweinstein/.conda/envs/DeepForest/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
results
{'results':    prediction_id  truth_id       IoU  ...  true_label    image_path  match
0             31         0  0.000000  ...        Tree  OSBS_029.tif  False
1             41         1  0.000000  ...        Tree  OSBS_029.tif  False
2             17         2  0.564232  ...        Tree  OSBS_029.tif   True
3             50         3  0.543922  ...        Tree  OSBS_029.tif   True
4             34         4  0.616394  ...        Tree  OSBS_029.tif   True
..           ...       ...       ...  ...         ...           ...    ...
56            26        56  0.685911  ...        Tree  OSBS_029.tif   True
57            42        57  0.575741  ...        Tree  OSBS_029.tif   True
58          None        58  0.000000  ...        Tree  OSBS_029.tif  False
59            10        59  0.692665  ...        Tree  OSBS_029.tif   True
60             1        60  0.763430  ...        Tree  OSBS_029.tif   True

[61 rows x 12 columns], 'box_precision': 0.8035714285714286, 'box_recall': 0.7377049180327869, 'class_recall':    label  recall  precision  size
0      0     1.0        1.0    56, 'predictions':           xmin        ymin        xmax  ...  label     score    image_path
0   330.080566  342.662140  373.715454  ...      0  0.802979  OSBS_029.tif
1   216.171249  206.591583  248.594864  ...      0  0.778803  OSBS_029.tif
2   325.359222   44.049034  363.431244  ...      0  0.751573  OSBS_029.tif
3   261.008606  238.633163  296.410034  ...      0  0.748605  OSBS_029.tif
4   173.029999    0.000000  229.023438  ...      0  0.738209  OSBS_029.tif
5   258.342041  198.233337  291.543762  ...      0  0.716250  OSBS_029.tif
6    97.654602  305.077118  152.689178  ...      0  0.711664  OSBS_029.tif
7    52.430378   72.021301   85.009918  ...      0  0.698782  OSBS_029.tif
8   292.347534  368.635132  332.700928  ...      0  0.688486  OSBS_029.tif
9   249.411453   51.019691  277.099640  ...      0  0.688165  OSBS_029.tif
10  317.631165  181.155960  345.687683  ...      0  0.686540  OSBS_029.tif
11   18.473246  346.017670   57.073792  ...      0  0.668805  OSBS_029.tif
12  272.603821  330.232361  305.601562  ...      0  0.666738  OSBS_029.tif
13  277.753357    0.110479  311.371582  ...      0  0.654231  OSBS_029.tif
14  195.076935  339.480408  227.222992  ...      0  0.648725  OSBS_029.tif
15  190.892273   78.937210  244.395508  ...      0  0.629515  OSBS_029.tif
16    0.000000  144.165573   42.506039  ...      0  0.622943  OSBS_029.tif
17  184.994812  256.272949  229.686768  ...      0  0.604757  OSBS_029.tif
18  382.137787  264.451416  400.000000  ...      0  0.600438  OSBS_029.tif
19  234.207275  296.622803  273.882538  ...      0  0.589491  OSBS_029.tif
20  289.422516   83.104446  332.196808  ...      0  0.574947  OSBS_029.tif
21  178.419952  372.066101  206.742188  ...      0  0.552992  OSBS_029.tif
22  327.421021  119.720184  358.950806  ...      0  0.551969  OSBS_029.tif
23    0.000000   48.725574   30.629400  ...      0  0.546445  OSBS_029.tif
24  374.312622  207.728210  398.147827  ...      0  0.538519  OSBS_029.tif
25  159.874435  167.144257  197.542709  ...      0  0.536578  OSBS_029.tif
26   58.809822  294.708130  101.116890  ...      0  0.534906  OSBS_029.tif
27   40.161354    0.914158   94.033958  ...      0  0.529695  OSBS_029.tif
28    3.540020  216.683243   40.567917  ...      0  0.527856  OSBS_029.tif
29  286.517975  137.959244  316.439972  ...      0  0.526694  OSBS_029.tif
30  369.212402   19.478354  398.912109  ...      0  0.524993  OSBS_029.tif
31  228.237549  388.862610  252.511566  ...      0  0.523766  OSBS_029.tif
32  104.303833  181.964188  155.852936  ...      0  0.523676  OSBS_029.tif
33   55.202187  190.014359   89.678932  ...      0  0.502996  OSBS_029.tif
34  312.022125    7.093416  344.556549  ...      0  0.502465  OSBS_029.tif
35  366.731781  119.042236  387.905792  ...      0  0.495438  OSBS_029.tif
36  118.516029   68.083710  147.080994  ...      0  0.492526  OSBS_029.tif
37   75.934868  144.205017  114.146507  ...      0  0.490224  OSBS_029.tif
38  145.504059  386.502502  176.084259  ...      0  0.487398  OSBS_029.tif
39    0.000000  258.224457   21.422482  ...      0  0.462236  OSBS_029.tif
40  102.027664   30.461601  130.979263  ...      0  0.418635  OSBS_029.tif
41  390.224884  375.093231  399.788422  ...      0  0.415022  OSBS_029.tif
42   88.018456  365.364807  107.558815  ...      0  0.403836  OSBS_029.tif
43  231.916046  243.187439  252.987671  ...      0  0.402279  OSBS_029.tif
44  388.845673  175.683319  399.906464  ...      0  0.397244  OSBS_029.tif
45  101.755402  124.721375  121.393097  ...      0  0.396860  OSBS_029.tif
46  384.908203  321.104553  400.000000  ...      0  0.389809  OSBS_029.tif
47   72.870079  251.158737  115.134605  ...      0  0.384928  OSBS_029.tif
48  130.386810   99.750061  183.451813  ...      0  0.369765  OSBS_029.tif
49  381.477173   75.818222  400.000000  ...      0  0.355139  OSBS_029.tif
50  373.786987    0.000000  399.417480  ...      0  0.350143  OSBS_029.tif
51    0.102808  200.861526    8.157425  ...      0  0.334018  OSBS_029.tif
52  312.861847  334.460052  337.424225  ...      0  0.323812  OSBS_029.tif
53  331.021149   88.350540  355.052887  ...      0  0.317281  OSBS_029.tif
54    0.317305  375.103668    8.618253  ...      0  0.315891  OSBS_029.tif
55  162.206650  264.200470  188.070602  ...      0  0.306989  OSBS_029.tif

[56 rows x 7 columns]}
results["box_recall"]
0.7377049180327869
results["box_precision"]
0.8035714285714286

Long term we will want to try curriculum learning/cross-training across different spatial resolutions.

bw4sz commented 7 months ago

Worth exploring slicing inference. https://colab.research.google.com/github/obss/sahi/blob/main/demo/inference_for_torchvision.ipynb