terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Prepare data samples from Seasons 4 and 6 #262

Closed dlebauer closed 5 years ago

dlebauer commented 5 years ago

Prepare sample data sets for D3M and Phenome

Subset by date and plot:

  1. by plot, use plots in ranges 20 and 30
  2. by time (for sensor data) take data from four separate weeks: before planting, 4 and 8 weeks, and the last day of measurement before harvest.
Season 4 Season 6 number of consecutive days
Start 2017-04-20 2018-04-06 1
Week 4 2017-05-18 2018-05-04 4
Week 8 2017-05-15 2018-06-01 1
harvest 2017-09-11 2018-07-25 1 (last before harvest)
  1. Sensor data to curate:
  1. Environmental and plot level trait data (to be handled in a separate issues) we will use the entire season.
max-zilla commented 5 years ago

@dlebauer can you export the two BETY datasets for us (canopy height, canopy cover) for those time ranges if possible? I've been swamped getting the other datasets ready and copied into the /samples directory and your team knows the BETY csv export API better than us.

the hand measurements might as well be included also.

dlebauer commented 5 years ago

hi max ... I put the samples on globus under /samples/sampletraits here: https://app.globus.org/file-manager?origin_id=403204c4-6004-11e6-8316-22000b97daec&origin_path=%2Fsamples%2Fsampletraits%2F

the queries I used are here: https://gist.github.com/dlebauer/b3b0e12f70e5b86d1c20a9c962f4df65

max-zilla commented 5 years ago

Data is uploaded here: https://terraref.ncsa.illinois.edu/clowder/spaces/5c50512a4f0c436195b9ad67

NewcombMaria commented 5 years ago

Thanks @max-zilla and @dlebauer for collating example data sets! Is anyone doing QC on the canopy_height (Scanner 3d ply data to height)? I don't want to step in if someone else is covering QC.

In case it's helpful, I looked at the data and notice some anomalies. Season-4: mean of the canopy height values on 27-April = 26.1 cm, and the mean of values on 24-May is 28.5 cm. The 27-April heights are high (planting data was 20-April). Season-6: May 21-24 canopy height values are mostly zeroes, with occasional values in the range of 56 or 62 cm, then later in the season the values are much higher as expected but there are occasional zeroes, for example 5-July typically columns 8 and 13 have canopy height values of zero.

It's great to see plot-level mean values! Needs some quality control, but this is nice progress.

max-zilla commented 5 years ago

Public space with large datasets designed for download: https://terraref.ncsa.illinois.edu/clowder/spaces/5c50512a4f0c436195b9ad67 Each sensor x season has a large flat dataset.

Public space with plot-based datasets for RGB, IR and LAS here: https://terraref.ncsa.illinois.edu/clowder/spaces/5c548a094f0c4b0cbe7afde1 This has collections for each sensor x day (divided into two seasons) and those collections contain 1 dataset per plot, so it's easier to select a particular plot of data on a given day.

I also shared a shorthand API call in Slack where you can get a list of all the datasets associated with a particular sensor & plot:

https://terraref.ncsa.illinois.edu/clowder/api/datasets?exact=true&title=rgb_geotiff - MAC Field Scanner Season 6 Range 20 Column 1
dlebauer commented 5 years ago

@NewcombMaria thanks for finding those - the previous canopy cover extractor had issues separating soil and plant under different lighting conditions; I've marked these records as 'checked and found to be in error' so they will be hidden from public access https://github.com/terraref/reference-data/issues/186#issuecomment-459537648

max-zilla commented 5 years ago

Here is a matrix I used to adhere to the suggested dates as best as possible: screen shot 2019-02-01 at 2 36 41 pm

screen shot 2019-02-01 at 2 36 35 pm

We didn't have perfect coverage of sensors & plots for the requested days given the nature of our scans and some downtime for particular sensors, so I tried my best to get representative samples of each sensor for the rough time periods requested.

@NewcombMaria and @nshakoor note that I also ran one week of PS2 data (one day from S4, six from S6) for inclusion in the sample data. this was not originally in the plan but it was requested in order to support Maria's PSII presentation.

dlebauer commented 5 years ago

@max-zilla that looks fantastic - could you upload these tables (e.g. to google drive) so that we can make use of them (in documentation and presentatiosn etc?)

max-zilla commented 5 years ago

@dlebauer https://drive.google.com/open?id=1IXJEQZkGuF495hhWO2-WJWPLqKyIsEyU

Copied these 2 slides, along with several others that might be useful for such a presentation. Those two are near the end.