weecology / NeonTreeEvaluation

Benchmark dataset for tree detection for airborne RGB, Hyperspectral and LIDAR imagery
Creative Commons Zero v1.0 Universal
133 stars 23 forks source link

lidar data with missing labels #33

Open Lenostatos opened 3 years ago

Lenostatos commented 3 years ago

Hi Ben,

I investigated the lidar data with regards to the labels and I found some point clouds without any labels and some that are missing just a few annotations. The plots are listed at the end of the reprex below:

library(tidyverse)

plot_names <- NeonTreeEvaluation::list_annotations()

# Loop over plot names and investigate the point cloud labels
lidar_label_data <- map_dfr(plot_names, function(plot_name) {

  # cat("Plot [", which(plot_name == plot_names), "/", length(plot_names), "] \"",
  #     plot_name, "\"... ", sep = "")

  # Get file path of possible annotation data
  annotation_file_path <- system.file(
    "extdata", "NeonTreeEvaluation", "annotations",
    package = "NeonTreeEvaluation"
  ) %>%
    file.path(plot_name) %>%
    paste0(".xml")

  # Get file paths of possible lidar data
  lidar_file_paths <- system.file(
    "extdata", "NeonTreeEvaluation", "evaluation", "LiDAR",
    package = "NeonTreeEvaluation"
  ) %>%
    file.path(plot_name) %>%
    paste0(c(".las", ".laz"))

  if (
    !file.exists(annotation_file_path) ||
    !any(file.exists(lidar_file_paths))
  ) {
    # cat("skipped.\n", sep = "")
    return()
  }

  # Get the lidar data
  lidar_file_path <- lidar_file_paths[file.exists(lidar_file_paths)][[1]]
  point_cloud <- suppressWarnings(lidR::readLAS(lidar_file_path))

  # create a one-row table with some data
  if ("label" %in% colnames(point_cloud@data)) {
    res <- tibble(
      plot_name,
      has_labels = TRUE,
      num_unique_labels = point_cloud@data %>%
        filter(!is.na(label), label != 0) %>%
        pull(label) %>%
        unique() %>%
        length(),
      num_annotations = NeonTreeEvaluation::get_data(plot_name, "annotations") %>%
        NeonTreeEvaluation::xml_parse() %>%
        nrow()
    )
  } else {
    res <- tibble(
      plot_name,
      has_labels = FALSE,
      num_annotations = NeonTreeEvaluation::get_data(plot_name, "annotations") %>%
        NeonTreeEvaluation::xml_parse() %>%
        nrow()
    )
  }

  # cat("done.\n")

  return(res)
})

# List the plots that don't have a label attribute in the first place
lidar_label_data %>% filter(!has_labels)
#> # A tibble: 12 x 4
#>    plot_name     has_labels num_unique_labels num_annotations
#>    <chr>         <lgl>                  <int>           <int>
#>  1 NIWO_001_2018 FALSE                     NA             176
#>  2 NIWO_002_2018 FALSE                     NA             292
#>  3 NIWO_004_2018 FALSE                     NA             107
#>  4 NIWO_005_2018 FALSE                     NA             146
#>  5 NIWO_010_2018 FALSE                     NA             148
#>  6 NIWO_012_2018 FALSE                     NA             136
#>  7 NIWO_014_2018 FALSE                     NA             179
#>  8 NIWO_015_2018 FALSE                     NA             150
#>  9 NIWO_016_2018 FALSE                     NA             134
#> 10 NIWO_017_2018 FALSE                     NA             151
#> 11 NIWO_042_2018 FALSE                     NA               5
#> 12 SJER_046_2018 FALSE                     NA              14

# List the plots where the number of annotations and the number of unique labels
# in the lidar data don't match
lidar_label_data %>%
  filter(has_labels) %>%
  mutate(num_unique_label_diff = num_unique_labels - num_annotations) %>%
  filter(num_unique_label_diff != 0)
#> # A tibble: 4 x 5
#>   plot_name     has_labels num_unique_labe… num_annotations num_unique_label_di…
#>   <chr>         <lgl>                 <int>           <int>                <int>
#> 1 BLAN_005_2019 TRUE                     33              34                   -1
#> 2 TEAK_051_2018 TRUE                     51              52                   -1
#> 3 TEAK_055_2018 TRUE                     18              19                   -1
#> 4 TEAK_059_2018 TRUE                     69              72                   -3

Created on 2021-05-21 by the reprex package (v2.0.0)

Cheers, Leon

bw4sz commented 3 years ago

I'll need to think about this more, but I believe this is intended. I'll add a note to the front of the readME, but not all data in the evaluation/lidar (or evaluation/rgb) have annotations. Many are unannotated, in case people want to do unsupervised learning, or annotate more. There are like 1200 plots, guessing an average of 60 trees, that's ~70,000 trees to annotate. There real question is where there are plots which are annotated in the RGB, but we haven't draped them into the LiDAR (meaning this script needs to be rerun https://github.com/weecology/NeonTreeEvaluation/blob/master/utilities/create_lidar_annotations.py). That is possible and worth checking to make sure it doesn't cause the mismatch in annotations number. More likely, the 2nd point is inevitable given the sparse density of the cloud, many trees which can be seen in the RGB have no points in the LiDAR, so nothing gets draped. I will rerun the create_lidar_annotations.py tomorrow and check the 2nd script, but I expect it not to change.

Lenostatos commented 3 years ago

I understand that there are unannotated images and point clouds but I thought that once annotations are created for an RGB image, they are draped onto the point cloud as well? If this is correct, my analysis lists plots for which annotations exist but have not been draped onto the corresponding point clouds.

Of course, the point clouds which are missing just a few annotations might just miss them because there are no points at those annotations.

In any case, all of this is of course not that big of a problem, since the point cloud labels are not vital to any analysis (at least not any that I can think of) other than maybe visualization.