terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Obtaining spectral and composition data at subplot or plant level #264

Closed ericclei closed 3 years ago

ericclei commented 5 years ago

I am trying to combine hyperspectral data with leaf chemical composition data from BETYdb at a subplot or plant level.

I am most interested in spectral and composition data, though any other categories would be helpful too.

The end goal is to predict composition from other traits. As I understand from my conversation with @dlebauer, I will need to process the hyperspectral image data in order to clip it to the plot or sub-plot level.

I would appreciate any pointers or example scripts to get started.

dlebauer commented 5 years ago

@ericclei one place to start is with the sample data

I am not sure exactly how to clip the hyperspectral data to a plot, but you can find the plot polygons by constructing a url like: https://terraref.ncsa.illinois.edu/bety/api/v1/sites?name=~Season 4 Range 30 to find information about all of the season 4 plots in range 30.

The geometry field is in a standard format called 'WKT' that many different software types can use, it will look something like "geometry": "MULTIPOLYGON (((-111.97504767486785 33.07488685592014 0.0, -111.97503949029482 33.074886858587995 0.0, -111.97503947576052 33.07485529088314 0.0, -111.97504766033062 33.07485528821534 0.0, -111.97504767486785 33.07488685592014 0.0)))"

It is possible that there are functions in terrautils for clipping the hyperspectral data but I can not confirm. One challenge is that it has x and y dimensions and the lat and lon information is stored as vectors that map to x and y. If you look inside the netcdf file metadata you can see.

This is not easy but is something that we should be able to add to the terrautils python library if it is not already there.

At the same time, before using the hyperspectral data you should be aware of the limitations and assumptions in the calibration algorithms, which @Paheding is currently working on improving. See especially https://github.com/terraref/extractors-hyperspectral#limitations as well as links to related issues included at the bottom of that document.

dlebauer commented 5 years ago

If you use the nco software, you can clip along the x and y dimensions using a command such as ncks -d x,1200,1400 -d y,2000,2100 in.nc out.nc to subset a box w/ bounding box (1200,2000),(1400,2100). The challenge is to match up the lat/lon with the y/x dimensions (all of the information is in the netcdf file, it is just a matter of extracting and joining the x dimension with the longitude variable and y with latitude.

The transformations are documented but to my knowledge not yet implemented for clipping.

ericclei commented 5 years ago

Thanks for the guidance. Is there a way to get plant-level chemical composition data, or have I missed something? The example data seem to be at the plot level.

dlebauer commented 5 years ago

@ericclei mostly we are treating the measurements at the plot level. There are a few cases where we have measured traits repeatedly on the same plant. In season 4, some of the plants were tagged for the purposes of repeated measurements. To date, I am not aware that anyone has gone back to identify these specific plants in images although, although in principle I believe it may be possible that this can be done by someone who carefully looks through the images.

@NewcombMaria can provide more information on what to look for in the images, and where.

These can be extracted from the database thus:

copy 
  (select trait, mean, raw_date, entity 
     from traits_and_yields_view 
       where sitename ~ 'Season 4 Range (20|30)'
       and entity != '' and trait not like 'surface_temperature%') 
  to '~/season4_marked_plants.csv' delimiter ',' csv header;

I've attached this subsample here and have also placed these in the sample dataset: https://terraref.ncsa.illinois.edu/clowder/datasets/5c6db73e4f0ce7f7828c90a1 season4_marked_plants.csv.txt

ericclei commented 5 years ago

How about composition at the subplot level? How are the plot level measurements collected in the first place?

dlebauer commented 5 years ago

It depends on what is being measured. Information about the methods (still not fully curated) is stored in the methods table. Often a set of leaves are selected (either randomly, or according to a standard protocol like the three plants at the center of each row) and measured. In general, though, we don't keep track of the individual plants (except to identify that multiple measurements were made on the same plant).

In some cases, like end of season composition data, the entire plot is harvested, chopped up and mixed, and a subsample is taken for measurements.