sizhuoli / TreeCountSegHeight

Code for paper
Apache License 2.0
59 stars 11 forks source link

Training annotations for community model #5

Open bw4sz opened 10 months ago

bw4sz commented 10 months ago

I am the maintainer of an open source python package for individual tree segmentation in RGB imagery.

https://github.com/weecology/DeepForest

I am getting ready to retrain the base model with thousands of new annotations we've collected through research projects around the world.

https://github.com/weecology/DeepForest/issues/340

I'd like to include the data used to train this model

In the paper

We manually delineated a total of 24,466 individual tree crowns from sampling plots of varying sizes distributed over Denmark, among which 19,771 crowns (49% in dense deciduous forests, 30% in dense coniferous forests, and 21% in nonforest areas) were assigned to the training data set, 2,016 crowns (46% in dense deciduous forests, 38% in dense coniferous forests, and 15% in nonforest areas) to the validation data set, and 2,679 crowns (48% in dense deciduous forests, 32% in dense coniferous forests, and 20% in nonforest areas) to the final test data set. The manual delineation took ∼3 weeks and involved creating annotations covering a large variety of tree species and landscape types.

I know that annotating crowns is very difficult work and I think its important that the community gets the maximum benefit out of this hard task. Thanks for your consideration.

Ben Weinstein

robodrome commented 8 months ago

I have identified a source that could also be helpful for training and testing models. Some municipalities in the Netherlands have open data on trees. We are talking about hundreds of thousands labeled trees, including age, size, species. No tree crowns however. Here's a link to a website that sums up a number of them. These are mostly urbanized areas, not dense forests.

In combination with RGB, CIR (can be converted to NDVI), DSM, DTM and pointcloud open data this could be potentially be made into a useful dataset for training.

Hope it helps.

Best, David

bw4sz commented 8 months ago

@robodrome Do you have any experience with this dataset? I see some geolocated trees

image

but no corresponding overhead imagery. Is that right?

robodrome commented 8 months ago

@bw4sz No I don't have experience with the Dutch tree maps. Yet. I am just starting to look into it. I found the paper and this code repo and I would like to test it on Dutch open data.

Bomenkaart.org is not an official source. Official sources are e.g. municipalities. Take Utrecht, which has an ARCGIS viewer and a ARCGIS feature service. Or Amsterdam bomenkaart. The magic word in google is bomenkaart.

The link I provided (https://geotiles.nl/) directs you to a site where you can download hires aerial imagery for the Netherlands; RGB, CIR but also pointcloud data and DTM, DSM. The latter have 0.5m resolution, but I have used the AHN4 pointcloud to create 0.25m DTM, DSM and CHM.

Geotiles is a convenient platform to get the data. I think it is funded by TU Delft (University). PDOK is the official source, which is funded by the Dutch government.

Does that answer your questions?

Best, David

PS. Also, at bomenkaart.org you can select map type in the right bottom.

image

bw4sz commented 8 months ago

Thanks for your help on this, there is a lot of data out there and each one requires abit of experience. Looking at any given dataset, is it possible to figure out which tile from (https://geotiles.nl/) needs to be downloaded?

So if i'm in browser and I download this point data, or really all the point data.

image

I'll need to manually go to

Screenshot 2023-11-07 at 7 05 24 AM

and figure out geographically which tiles they correspond to? Any way to wget all of them, I could search using python. Thanks for your thoughts, i appreciate it.

bw4sz commented 8 months ago

also @robodrome, we've hijacked this thread, you are welcome to email me benweinstein2010@gmail.com, sorry @sizhuoli, let me know if there is progress on getting the other data.

sizhuoli commented 8 months ago

Hi Ben @bw4sz, I wrote twice to the contact I know but unfortunately haven't got a reply yet:)

robodrome commented 8 months ago

Thanks for your help on this, there is a lot of data out there and each one requires abit of experience. Looking at any given dataset, is it possible to figure out which tile from (https://geotiles.nl/) needs to be downloaded?

So if i'm in browser and I download this point data, or really all the point data. I'll need to manually go to

and figure out geographically which tiles they correspond to? Any way to wget all of them, I could search using python. Thanks for your thoughts, i appreciate it.

@bw4sz

It will be some work, but I think this data could be useful for training sets. It is likely very accurate and detailed.