weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
478 stars 172 forks source link

Input Data #418

Closed NickHarnau closed 1 year ago

NickHarnau commented 1 year ago

Hey :)

I am about to use DeepForest for a University Project. Therefore I have a question regarding the input data and the resulting perfomance: Do you have any knowledge if the model performs better if the input data is a tile from an orthomosaic? I have several RGB Images from a UAV. Now I have the option to take the raw images and label them and take them as an Input for the model or I first create an orthomosaic and then get tiles from that and the label them and take them as an Input. Surely I could try both approaches, but that would mean I have to label twice, what is time consuming -> so the question is, if you have any experience on that and on the perfomance? (Of course the perfomance depends on several factory)

Thanks

ethanwhite commented 1 year ago

TLDR; If I had to guess performance will be a bit higher on raw imagery for detection and maybe a lot better for classification, but it introduces complexities if you want to count or estimate the size of trees.

This is something that we're actively experimenting with for a project on birds using DeepForest. I'm fairly sure we've run DeepForest for people on tree data that isn't in an orthomosaic and it works fine, but we haven't done any analysis on differences for the trees.

Intuitively here's my guesses, but I'd also be curious what @bw4sz thinks:

  1. For just detecting trees generically it probably won't make a lot of difference in terms of single tree accuracy, but I would expect the non-ortho'd images to work a little better due to a lack of resulting distortions.
  2. Any improvements in detection are trading off against the fact that non-ortho'd images will have variation in size introduced by position relatively to the camera.
  3. Are the images overlapping? If they are and you want to count trees then if you work on the raw imagery you'll need an approach to identifying duplicate trees in overlapping images.
  4. If you're doing species classification then I think working with the raw data may be more valuable.

We're working on solutions that integrate the two approaches but that's early phase work (based on https://easyidp.readthedocs.io/en/latest/index.html) and so I wouldn't recommend waiting for it to get started.