terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Upload hand measured traits from KSU 2016 #119

Closed dlebauer closed 7 years ago

dlebauer commented 7 years ago

Data collected by hand in 2016 should be uploaded to the TERRA REF instance of BETYdb (terraref.ncsa.illinois.edu/bety) following instructions for bulk upload

First step is to make sure that all of the variables and methods in the uploads have matching records in the variables and methods tables. Names and units must match exactly.

Completion Criteria

dlebauer commented 7 years ago

@zhenbinHU how is the trait "Sugar" measured? Is it the sugar content of the stem juice or ???

The sugar, ADF, CP, and moisture were measured with harvestlab for fresh plant.

NewcombMaria commented 7 years ago

@dlebauer I'm wondering how to use common variable names for hand measured traits that overlap between KSU and Maricopa. Is there a way that I can be updated about what variable names are created for KSU 2016 traits? For example, we ( @JeffWhiteAZ and I) are preparing Sorghum season1 and season2 harvest data for bulk upload, and it's not clear to me what variable names are best for BETYdb. For example, sorghum crop ontology lists "Whole above ground biomass yield at harvest" http://www.cropontology.org/rdf/CO_324:0000553. This is a long variable name. I'd like to be consistent with KSU when possible. This is similar to issue #118. When possible, I'll maintain the same variable names as KSU if I can access the new variables for hand measured data from other locatons in general.

dlebauer commented 7 years ago

I'm wondering how to use common variable names for hand measured traits that overlap between KSU and Maricopa.

@NewcombMaria

We could just keep the dry biomass and moisture content. (we could also keep just fresh + moisture content, but I know that dry biomass is often the key variable of interest; I am not as sure if this is the case with fresh biomass).

dlebauer commented 7 years ago

@zhenbinHU I've looked through a few of these files. And taking the stem_width as an example, here are what needs to be done

  1. Dates must be in YYYY-MM-DD format
  2. All traits must be in the variables table
  3. All cultivars must be in the database. I inserted the list you gave me from the genotypes table in the KSU 2016 spreadsheet but they are not all present. For example, 14CS3450/3389
  4. All plots in the dataset must be in the database. It looks like many are not. You asked me to remove many of the plots (border rows, mixed plots, etc) from the database but you may not have removed these from the sheets. For example TR_R26_P1
dlebauer commented 7 years ago

@zhenbinHU

Note that I think that we may be using different names for the cultivars, e.g. the genotype marked in your list as "16PR1430" is named "RIL-CS83_(R10709/F08331bmr12)-CSF1-PRF2-CS83" from this table: https://docs.google.com/spreadsheets/d/1QQaWc0UaQQKfEtnSO1G2za8tKU2huC0_VYMBqm5CKAo/edit#gid=796817704 per this discussion https://github.com/terraref/reference-data/issues/65

zhenbinHU commented 7 years ago

@dlebauer You are right. We use different ID. Because the name by pedigree is so long.

dlebauer commented 7 years ago

@zhenbinHU we need to have a way of uniquely identifying germplasm across research groups (https://github.com/terraref/reference-data/issues/65). If you want to use a shorter name, it won't match the names that we have. You can use the shorter name when collecting data, but not when uploading data.

NewcombMaria commented 7 years ago

@zhenbinHU @dlebauer and @gpmorrisksu, I made a trial run BETYdb upload with a subset of KSU data that Zhenbin sent to me to see if I could help figure out the error roadblocks. The error codes copied below are all related to sites_cultivars. Is there an updated sites_cultivars table for KSU 2016?

Data Value Errors

gsrohde commented 7 years ago

@NewcombMaria I don't think these errors have anything to do with the sites_cultivars table. The first two errors have to do with a site name or cultivar name given in the CSV file not matching any name the database knows about.

The third error has to do with there being multiple matches in some table for a given name. It's not clear from the error message which table this is, but if you click on a number, it should take you to the row the error occurs in, and then you may be able to figure out which table is being referred to by which column the error occurs in.

ghost commented 7 years ago

@dlebauer please review data

ghost commented 7 years ago

See review https://github.com/terraref/reference-data/issues/181