plantphys / spectratrait

A tutorial R package for illustrating how to fit, evaluate, and report spectra-trait PLSR models. The package provides functions to enhance the base functionality of the R pls package, identify an optimal number of PLSR components, standardize model validation, and vignette examples that utilize datasets sourced from EcoSIS (ecosis.org)
GNU General Public License v3.0
11 stars 9 forks source link

Cleanup and complete README document #32

Closed serbinsh closed 3 years ago

serbinsh commented 3 years ago

Completed some updates: https://github.com/TESTgroup-BNL/PLSR_for_plant_trait_prediction/blob/master/README.md

Review and feedback welcome. Also what else do we need?

neo0351 commented 3 years ago

Do we want to discuss some of the different approaches that can be used? Like choices for splitting data and choosing nComps?

neo0351 commented 3 years ago

And maybe a tidbit about the different data sets? Like neon LMA is an example of using a huge dataset, takes forever to run. Kit is a small dataset, but requires some data cleaning?

serbinsh commented 3 years ago

@neo0351 Yes and Yes. If you could take the first crack at writing that up for review that would be helpful I have a full plate this AM

neo0351 commented 3 years ago

Submitted first crack. Looking at packages after lunch. Wanted to get the readme out for review.

serbinsh commented 3 years ago

We also need to make sure we cover these aspects raised by Angela (still cant find her GitHub username to tag her here) in the READMe and a as-of-yet nonexistent dependency file

The code may be downloaded from GitHub at https://github.com/TESTgroup-BNL/PLSR_for_plant_trait_prediction/. We recommend that users refer to the README file before using the script. For easiest use, click the green Code download icon to download a zipped folder containing the R scripts and associated functions. This folder also contains a copy of the README file, and vignettes providing illustrated examples of the code. The script described in this manuscript is “expanded_spectra-trait_reseco_lma_plsr_example.R”; additional examples are provided for the interested reader. The code example is available through https://rpubs.com/sserbin/644468.

The LMA and spectra data used in this example code were accessed from the Ecological Spectral Information System open-source database of spectral and trait data (EcoSIS: https://ecosis.org/), and were collected from seven NEON (National Ecological Observatory Network) domains in eastern United States. Data collection and laboratory processing was conducted following the ‘best practices’ suggested above. Leaf spectra were collected using the ASD FieldSpec3 and/or Spectral Evolution PSR+ with their contact probe. This dataset included multiple plant species, such as broadleaf trees, grasses, and forbs. Prior to developing the PLSR, erroneous and outlying trait values and corresponding spectral measurements were excluded from the data. In this example, we use wavebands between 500 and 2400 nm. This range includes visible (VIS, 500 - 700 nm), near-infrared (NIR, 700 - 1100 nm), and short-wave infrared (SWIR, 1100 - 2400) wavelengths, given the coordination of leaf structure with pigment and water absorption features that co-vary with LMA (Baret and Fourty 1997, Niinemets 2007). However, selection of wavelengths may be applied, depending on the light absorbing and scattering features of the trait of interest, referring to Curran et al., (1989), Asner et al. (2011a & b), Serbin et al., (2012), Serbin et al., (2014) and Buitrago et al., (2018) (maybe also referring back to Alistair and Ken’s sections). Spectral regions with significant low signal-to-noise (SNR) are identified based on ASD and PSR+ handbooks (i.e. spectral wavelengths <500 nm and >2400 nm). The dataset is prepared in the form of a spreadsheet, and the spectral reflectance is scaled to the range of 0 to 1. Here, the scale used for spectral reflectance can vary depending on users’ preferences, but it must stay consistent across datasets, either used in model development or model application.

The example code requires several R packages, including ‘pls’, ‘dplyr’, ‘ggplot2’, ‘gridExtra’, ‘readr’, and ‘reshape2’. ‘Getting Started’ helps to load these required packages and set up the modeling environments. Other core helper functions for PLSR development, including VIP, are provided in the GitHub repository associated with this manuscript.

@neo0351 can you help me make sure this is all covered here?

neo0351 commented 3 years ago

ummmmmm, these paragraphs need to go in the readme?

serbinsh commented 3 years ago

No, I havent looked in detail I was just wondering if there were any things in here not already covered?

As for the data specific or refs etc that should all be covered by teh EcoSIS pages. I guess just need to figure out what might be missing and then we are done

neo0351 commented 3 years ago

sigh.....yeah. I'll help make sure we get these covered. #TL;DR