nbourne2 / birdsai

0 stars 0 forks source link

Research dataset #1

Open nbourne2 opened 9 months ago

nbourne2 commented 9 months ago

Use the NeonTreeCrowns annotated dataset described in the comment below.

nbourne2 commented 9 months ago

The dataset is provided by the ISPRS International Contest of Individual Tree Crown (ITC) Segmentation. It consists of high-resolution EO images of forests from various locations across the world, with accompanying annotation files containing the ITC delineations. The images are in .png format with RGB channels, and the annotation files are in .json format, and are organized based on the MS COCO format. The coordinates of the individual tree crown mask are in the image coordinate system, and are provided in the annotation file.

The data is divided as training, validation, and testing sets. The training and validation set comprise aerial images collected from various areas around world. The training set will be constantly updated and enriched during the validation phase. The testing set includes images from the same scenarios as in the training set, as well as different scenarios from other areas to evaluate the transferability of applied methods.

The contest uses evaluation metrics based on the standard Intersection over Union (IoU) method for comparing predicted and ground truth delineations. The predictions are considered as true positive if their overlap with ground truth satisfies certain IoU threshold, e.g., 50% or 75%. The evaluation will include metrics, such as the precision, recall, AP50 (average precision), AP75, etc (see eg MS COCO detection evaluation).

Is it possible to download the ITC dataset without registering to participate in the contest? This could be a blocker.

nbourne2 commented 9 months ago

The ISPRS dataset does not appear to be open for download without registering to participate. Another dataset that definitely is open is the NeonTreeCrowns dataset described by Weinstein et al. 2021a (preprint on biorxiv). This dataset can be downloaded from Zenodo and is visualised here. The dataset consists of aerial images of 100 million trees at 37 geographic sites across the United States. Delineations consist of rectangular bounding boxes only. There are 11,000 RGB images of 1 square km, each with a shapefile and csv file containing the ITC delineations. These delineations were obtained as predictions from the DeepForest object detection algorithm.

Another benchmark dataset is published by Weinsten et al. 2021b, consisting of a much smaller sample combining RGB, lidar and hyperspectral sensor data. The benchmark dataset contains over 6,000 image-annotated crowns, 400 field-annotated crowns, and 3,000 canopy stem points from a wide range of forest types. In addition, we include over 10,000 training crowns for optional use. In comparison with Weinstein et al. 2021a, this is a much smaller sample but contains a combination of data sources and a subset of field-annotated crowns.