vscholl / neonVegWrangleR

Wrangling NEON vegetation structure (vst) for integration with Airborne Observation Platform (AOP) remote sensing data
Other
1 stars 5 forks source link

Upstream issue: Individual IDs are not unique for each eventID #6

Open bw4sz opened 4 years ago

bw4sz commented 4 years ago

We should either warn users, or probably provide a merging solution and not let things trickle into downstream analysis. I do not know why the following is not true, but I believe it is upstream of this package

  BART_data <- retrieve_VST_data(site = "BART")

  # Verify the same individual ID in the same year doesn't have more than one height
  multiple_heights<-BART_data[[3]] %>% group_by(individualID,eventID)  %>% summarize(n=length(unique(height))) %>% filter(n>1)

> expect_equal(nrow(multiple_heights), 0)
Error: nrow(multiple_heights) not equal to 0.
1/1 mismatches
[1] 104 - 0 == 104

Debugging, this is coming from NeonUtilities

  vst <- neonUtilities::loadByProduct("DP1.10098.001", check.size=F,
                                      site=site, start, enddate)

> multiple_heights<-vst[[3]] %>% group_by(individualID,eventID)  %>% summarize(n=length(unique(height))) %>% filter(n>1)
> 
> head(multiple_heights)
# A tibble: 6 x 3
# Groups:   individualID [6]
  individualID             eventID           n
  <fct>                    <fct>         <int>
1 NEON.PLA.D01.BART.00094  vst_BART_2016     2
2 NEON.PLA.D01.BART.00105  vst_BART_2018     2
3 NEON.PLA.D01.BART.00111  vst_BART_2015     2
4 NEON.PLA.D01.BART.00210  vst_BART_2015     2
5 NEON.PLA.D01.BART.00226A vst_BART_2016     2
6 NEON.PLA.D01.BART.00306  vst_BART_2015     2
> dim(multiple_heights)
[1] 104   3
> vst[[3]] %>% filter(individualID=="NEON.PLA.D01.BART.00105")
                                   uid         namedLocation       date       eventID domainID siteID   plotID subplotID
1 1d8ae27f-ea73-4c70-bc30-023552748106 BART_047.basePlot.vst 2015-09-03 vst_BART_2015      D01   BART BART_047        NA
2 71f8056f-3aea-4037-a74e-9a8ccf92c56b BART_047.basePlot.vst 2016-08-31 vst_BART_2016      D01   BART BART_047        NA
3 a74a9509-1e3e-46a7-866f-a80e2608bd3c BART_047.basePlot.vst 2017-09-12 vst_BART_2017      D01   BART BART_047        NA
4 aadaeaf0-be3e-4ab3-a3a3-ffe68d304fae BART_047.basePlot.vst 2018-08-20 vst_BART_2018      D01   BART BART_047        NA
5 0964ee3a-e558-4cb1-8689-303773d58514 BART_047.basePlot.vst 2018-08-20 vst_BART_2018      D01   BART BART_047        NA
             individualID tempShrubStemID tagStatus       growthForm plantStatus stemDiameter measurementHeight height
1 NEON.PLA.D01.BART.00105              NA      <NA> single bole tree        Live         27.1               130   16.8
2 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         27.0               130   16.5
3 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         26.8               130   16.4
4 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         27.0               130   15.7
5 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         28.1               130   16.6
  baseCrownHeight breakHeight breakDiameter maxCrownDiameter ninetyCrownDiameter canopyPosition shape basalStemDiameter
1              NA          NA            NA               NA                  NA           <NA>                      NA
2              NA          NA            NA               NA                  NA           <NA>                      NA
3              NA          NA            NA               NA                  NA           <NA>                      NA
4              NA          NA            NA               NA                  NA           <NA>                      NA
5              NA          NA            NA               NA                  NA           <NA>                      NA
  basalStemDiameterMsrmntHeight maxBaseCrownDiameter ninetyBaseCrownDiameter remarks                  recordedBy
1                            NA                   NA                      NA               ccahill@field-ops.org
2                            NA                   NA                      NA                  mday@field-ops.org
3                            NA                   NA                      NA              jbreault@field-ops.org
4                            NA                   NA                      NA         jlerner@battelleecology.org
5                            NA                   NA                      NA         jlerner@battelleecology.org
                  measuredBy     dataQF
1      dcrandall@neoninc.org legacyData
2    ramundson@field-ops.org legacyData
3 llukas@battelleecology.org legacyData
4 llukas@battelleecology.org       <NA>
5 llukas@battelleecology.org       <NA>

2018 has two different heights.

bw4sz commented 4 years ago

I wrote NEON about this.

vscholl commented 4 years ago

Good catch, and thanks for contacting NEON about this. Looks like there are indeed duplicates.

One thought I had was that multi-bole trees have multiple entries in the woody vegetation structure data set and that this could potentially be causing an issue, but the individual IDs have a letter on the end to indicate this: NEON.PLA.D01.BART.00226A, NEON.PLA.D01.BART.00226B, etc. Just something to keep in mind, since individual bole entries don't have all of the measurements as a single-bole tree.

bw4sz commented 4 years ago

I'll cc you, they are super aware, the problems are pretty expansive, in the full dataset i see , ~2000 duplicate locations (take most recent), 9000 duplicate IDs with different heights from the same event ID, hundreds of between year height changes of more than 6m. I'm trying to make a vignette here that i'll clean up.

https://github.com/bw4sz/neonVegWrangleR/blob/master/vignettes/Field_Data.Rmd