martibosch / detectree

Tree detection from aerial imagery in Python
https://doi.org/10.21105/joss.02172
GNU General Public License v3.0
233 stars 31 forks source link

Using detectree on own data #2

Closed MisterB92 closed 4 years ago

MisterB92 commented 4 years ago

Firstly I would like to thank you for solving my other issue. I can run the example now and am very grateful.

However, I would like to use the detectree classifier on my own data and I can't get it to work. In the example we use lidar data as a mask but unfortunatly I don't have this data. Therefore I made some manual masks with GNU Image manipulation program.

I've resized my aerial photo's to a size of 400,400,3 and then made a mask with black and white also in that size. I train on 2 images and test on a third image.

When training the classifier something weird already happens:

clf2= dtr.ClassifierTrainer().train_classifier(
    img_filepaths= ["/home/user/scripts/photo0.tif",
                    "/home/user/scripts/photo1.tif"],                    
    response_img_filepaths= ["/home/user/scripts/photo0_MASK.tif",
                    "/home/user/scripts/photo1_MASK.tif"])

[                                        ] | 0% Completed |  0.1s

/home/user/anaconda3/envs/detectree/lib/python3.7/site-packages/rasterio/__init__.py:219: NotGeoreferencedWarning: Dataset has no geotransform set. The identity matrix may be returned.
  s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)

[########################################] | 100% Completed |  0.7s

Is this 0% complete because the photos aren't georeferenced? Is this a problem?

Then when I want to make predictions for a photo I get the following error:

test_filepath = "/home/user/scripts/photo2.tif"
c = dtr.Classifier()
y = c.classify_img(test_filepath, clf2)

fig, axes = plt.subplots(1, 2, figsize=(2 * figwidth, figheight))
with rio.open(test_filepath) as src:
    plot.show(src.read(), ax=axes[0])
axes[1].imshow(y)
/home/user/anaconda3/envs/detectree/lib/python3.7/site-packages/rasterio/__init__.py:219: NotGeoreferencedWarning: Dataset has no geotransform set. The identity matrix may be returned.
  s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/envs/detectree/lib/python3.7/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    842     try:
--> 843         len(indices_or_sections)
    844     except TypeError:

TypeError: object of type 'int' has no len()

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-9-19f1aaaca745> in <module>
      1 test_filepath = "/home/user/scripts/photo2.tif"
      2 c = dtr.Classifier()
----> 3 y = c.classify_img(test_filepath, clf2)
      4 
      5 fig, axes = plt.subplots(1, 2, figsize=(2 * figwidth, figheight))

~/anaconda3/envs/detectree/lib/python3.7/site-packages/detectree/classifier.py in classify_img(self, img_filepath, clf, output_filepath)
    275             y_pred = clf.predict(X).reshape(img_shape)
    276         else:
--> 277             p_nontree, p_tree = np.hsplit(clf.predict_proba(X), 2)
    278             g = mf.Graph[int]()
    279             node_ids = g.add_grid_nodes(img_shape)

~/anaconda3/envs/detectree/lib/python3.7/site-packages/numpy/lib/shape_base.py in hsplit(ary, indices_or_sections)
    915         raise ValueError('hsplit only works on arrays of 1 or more dimensions')
    916     if ary.ndim > 1:
--> 917         return split(ary, indices_or_sections, 1)
    918     else:
    919         return split(ary, indices_or_sections, 0)

~/anaconda3/envs/detectree/lib/python3.7/site-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
    847         if N % sections:
    848             raise ValueError(
--> 849                 'array split does not result in an equal division')
    850     res = array_split(ary, indices_or_sections, axis)
    851     return res

ValueError: array split does not result in an equal division
martibosch commented 4 years ago

Hello, I am glad that my instructions for #1 worked, I will therefore close it (feel free to re-open it if necessary).

Regarding your current issue:

  1. GIMP does lose the raster metadata when exporting a GeoTIFF image. This SHOULD NOT affect the training of the classifier in DetecTree. However, you might recover such information as in:
import rasterio as rio

with rio.open("/home/user/scripts/photo0.tif") as src:
    with rio.open("/home/user/scripts/photo0_MASK.tif", 'r+') as dst:
        # see also https://rasterio.readthedocs.io/en/latest/topics/profiles.html
        dst.crs = src.crs
        dst.transform = src.transform
  1. Your ValueError is most likely caused by masks that contain more than 2 values (tree/non-tree). You might use rasterio to check it as in:
import numpy as np
import rasterio as rio

with rio.open("/home/user/scripts/photo0_MASK.tif") as src:
    print(np.unique(src.read()))

If more than two pixel values appear, you might also consider adapting the code above in order to map all the pixel values to a binary tree/non-tree encoding.

I will considering adding a check in DetecTree that ensures that the provided response masks consist of two pixel values only, and otherwise raises a more informative error.

An additional final note: what values have you used to represent tree and non-tree pixels in the masks? By default, DetecTree takes the tree/non-tree pixel values from the settings module (i.e., 255 for tree pixels and 0 for non-tree pixels). You might change these values by passing them to the pixel_response_builder_kws argument when initializing the ClassifierTrainer instance. I recognize that this part is not sufficiently documented and I will try to improve it as soon as I can :)

Please let me know if this helped. Best, Martí

MisterB92 commented 4 years ago

Thanks for the detailed explanation! I got the model to train on two aerial photo's of my own. However, when I want to use the trained model on a test photo I get nothing back. It's an image with only zero values.

I looked at the feature importances and this is what the output was:

class2.feature_importances_

array([0.   , 0.335, 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.33 ,
       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.335,
       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ])

While these were the importances of the classifier trained on the zurich data:

array([0.045, 0.115, 0.07 , 0.14 , 0.08 , 0.07 , 0.05 , 0.005, 0.01 ,
       0.065, 0.   , 0.015, 0.01 , 0.   , 0.035, 0.065, 0.02 , 0.04 ,
       0.005, 0.   , 0.025, 0.03 , 0.025, 0.01 , 0.035, 0.015, 0.02 ])

I checked my mask as you described and it was a mask with only values of 0 or 255. Wher 0 = non-tree and 255 represents tree pixels.

The resolution I train on is 1000 by 1000 pixels where 1 pixel is 25 cm.

Looking forward to your answer.

martibosch commented 4 years ago

Hello!

for information, the features with non-zero importance are:

Nevertheless, this is all indeed too abstract and there is little I can do without being able to reproduce your example. Would you be able to provide me with the two training tiles, their ground truth masks and maybe a testing tile as well? If so, you can send them to me at marti.bosch (at) epfl.ch

Thank you, Martí

MisterB92 commented 4 years ago

Thank you for your reply. I've just send you the data as requested.

Looking forward to your answer.

Bram

martibosch commented 4 years ago

The problem is on the one hand related to the provided ground truth masks. Since DetecTree employs a supervised learning approach, it is crucial to provide good ground truth masks that really match the tree pixels in the actual image. On the other hand, the problem is also related to the post-classification refinement procedure included in DetecTree. After some explorations in other datasets with different resolutions, in the new 0.3.0 release I have changed the default parameter (the refine_beta provided to the initialization method of the Classifier class) controlling the refinement from 100 to 50. I hope that this addresses your issue and that you are able to obtain your tree canopy maps with a satisfactory accuracy. Let me know (and feel free to reopen) if you need any further help.

Best, Martí