terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Selecting variables for Betydb from the MultispeQ v1.0 parameters #203

Closed NewcombMaria closed 7 years ago

NewcombMaria commented 7 years ago

@dlebauer and @gsrohde please review the spreadsheet at: https://drive.google.com/drive/folders/1MVkQpJ8f_gCHZP17j63tXaAGCT1R5tAO

which has field data related to photosynthesis parameters (MultispeQ v1.0 field measurements of fluorescence-based and absorbance-based parameters). I'll need input on which variables are of interest for BETYdb. There are 10 variables and 3 covariates highlighted in yellow that I think should definitely be included in the database. There are 42 variables (if I counted correctly) highlighted in blue that could be included if you think they would be important. Many of these photosynthesis parameters are new to me.

gsrohde commented 7 years ago

@NewcombMaria I tested uploading these to my own test machine. A couple of things:

  1. You can't use a column called citation. You have to break it into citation_author, citation_year, and citation_title. (If the citation had a DOI, you could just use citation_doi instead.) See item 6 in the section "Schema for CSV Data Files" at https://pecan.gitbooks.io/betydbdoc-dataentry/content/trait_insertion_api.html.

  2. You will need to enter each of the treatments into the database and associate it with the citation you are using. To do this via the Web interface, go to the citations page, search for the one you are using, and click the checkmark button in the row for it. Then go to the treatments page and click the "New Treatment" button. Enter the name exactly as it is in the spreadsheet.

  3. You can't use "NaN" as a data value. If data for a variable is missing for some row, just leave it blank. Make sure it's still blank when you export the spreadsheet as a CSV file. (It may be that Excel automatically converts blanks to NaN if the data type of the cell is specified to be numeric. If so, there may be some setting to control this.)

  4. Each trait or covariate column must have a heading exactly matching the name of a variable. Conversely, if a column heading matches a variable name but you don't want to make a trait or covariate from it, rename it (say, but appending the string "-ignore" to the heading name) or remove the column altogether. And, of course, if the column is supposed to be a covariate rather than a bona fide trait, then it must be specified as such in the trait_covariate_associations table. There needs to be a row in this table for each trait it should be a covariate of.

If you add the needed treatments and variables to the terraref database and want me to transfer them over to the test machine pecandev so that you can practice doing an API upload, let me know.

NewcombMaria commented 7 years ago

Thanks @gsrohde. I'm still figuring out the Treatment names (issue #196 ). I haven't yet named/defined the variables in this spreadsheet - but will work on that soon. I've defined the variables for the 3 spreadsheets referred to in issue #202, but now I see from your comment that I need to revise the citation and the treatments. How do I define covariates before the associated trait data is uploaded? The association needs to be assigned before uploading, correct?

gsrohde commented 7 years ago

Yes @NewcombMaria , the association needs to be assigned before uploading. Otherwise, each heading that matches a variable name is treated as a trait variable in its own right instead of as a covariate of some other variable.

The correspondence has to be set up by adding to the trait_covariate_associations table. David or I will have to do this for you once it is decided what the associations should be—unfortunately, the Web interface doesn't include a way of editing this table. (But you can view it at https://terraref.ncsa.illinois.edu/bety/trait_covariate_associations.)

NewcombMaria commented 7 years ago

Thanks @gsrohde. I've revised citation formatting and treatment names (now 2 treatments for all of Season4, issue #196 ). Three datasets are ready for upload except for the step of adding trait-covariate associations, which will require help on your end. To clarify which datasets have covariates that need to be added to the trait_covariate_association table, I'm going to close issue #202 and create a new issue specifically for covariates.

gsrohde commented 7 years ago

@NewcombMaria I see only two datasets in the comment above. Where's the third one?

NewcombMaria commented 7 years ago

The 3 datasets are in 2 different folders. In total there are now 4 datasets with local_datetime that are ready for upload (2 with covariates that were just associated with the corresponding traits and 2 without covariates). The 4 spreadsheets are as follows: in the folder directory https://drive.google.com/drive/folders/1TG5_qHqssuAzzAPBbvTC7DnEvxCIxFJY

in the folder directory https://drive.google.com/drive/folders/1MVkQpJ8f_gCHZP17j63tXaAGCT1R5tAO

NewcombMaria commented 7 years ago

All of the variables for the Multispeq dataset are now defined. There are 35 trait variables, and 6 covariates. It's a complicated dataset. I'm going to close this issue and create a new one specific to the trait_covariate_associations that need to be defined.