Insert experimental design for 2016 Sorghum trials at MAC into TERRAREF BETYdb

dlebauer commented 8 years ago

Detailed Description

The Experimental design for Sorghum trials is defined in this google spreadsheet

Next steps

Define intended plot layouts and plant locations on the gantry reference system (meters offset from [0,0]. We need:
- [x] plot and alley dimensions
- [x] row spacing
- [x] locations of the corner of the first plot and first plant in that plot
  - we can start with the experimental design (as opposed to what we can later observe as the actual locations) and update the layout as necessary
  - if we have information that may not be accurate that is okay - we can update it later. Best if we have an estimate of the scale of uncertainty (e.g. within X cm)? For plot boundaries a few cm should be sufficiently precise.
  - we can compare and update the information with observed positions of the plants and outputs of the precision planter
later we can convert the plot layouts into geospatial data. This requires lat/lon coordinates for [0,0]
Import the genotypes information into the TERRAREF/BETYdb cultivars table
- Genotype information is stored in the "Source", "Common Name", and "Pedigree fields" of the 'Treatments'
- [x] Identify the unique identifier / 'primary key' for these. Pedigree information can be later imported to BMS, and the primary key can link from BETYdb to BMS.
Import traits from Treatments table (Which traits are useful?)
- BETYdb isn't currently set up to ingest categorical traits, this feature is requested in PecanProject/bety#409 and can be implemented if there is need
- BMS does store categorical traits.
- Which categorical variables can be converted to numeric values with date and location of observation?
  - for example, It would be preferable to store these as numeric values if possible. For example, instead of 'maturity group' we could store the phenological observations as boolean (0/1) This could be combined with date of planting or emergence to compute the number of days to any particular phenological stage and then the cutoffs could be defined (and altered if desired).
- Where, when, and under what conditions was height measured?
import information from Experiments worksheet on google drive into experiments table in BETYdb
- [x] Create 'experiments' table (PecanProject/bety#410)

ghost commented 8 years ago

@dlebauer - please touch base with Scott about this.

dlebauer commented 8 years ago

remaining issues:

[ ] link cultivars to trait records https://github.com/terraref/computing-pipeline/issues/201#issuecomment-267458905
[ ] associate plots with experiments

Experiments

Season 1:

relationships are first two columns of https://docs.google.com/spreadsheets/d/1iHmSyFeCO7np_lxYyvdiXQQcAwBJpIKpeAOyMXFd-OY/edit#gid=901711246

for each unique expt name in the speadsheet create an experiment record with the name "Season 1 --expt--"
- experiment duration is planting date May 2016 to harvest in July 2016 (look in managements for exact dates; choose latest)
add all of the associated sites where name like 'UA MAC Season 1 Plot [col B]%

For season 2:

add a experiment named "Season 2 Stay-green RILs F10"
- experiment duration is Aug 2016 to harvest in Dec 2016 see managements for exact dates (choose latest).
add all sites where name like 'MAC Season 2%'

Cultivars

run update statements in https://github.com/terraref/computing-pipeline/issues/201#issuecomment-267458905
for future, create cultivars_sites table + trigger to autopopulate as defined in https://github.com/PecanProject/bety/issues/475

gsrohde commented 7 years ago

@dlebauer To be clear, should all of the season 1 experiments should have the same start and end dates? Looking at all of the managements in https://terraref.ncsa.illinois.edu/bety/managements, the latest harvest date is 2016-07-14. There are planting dates 2016-04-19, 2016-04-20, and 2016-08-03. I assume the latter is for season 2. There are no planting dates in May. So should the duration be "2016-04-19 to 2016-07-14" for all season 1 experiments?

dlebauer commented 7 years ago

Yes. duration is "2016-04-19 to 2016-07-14" for all season 1 experiments

On Fri, Jan 13, 2017 at 11:48 AM Scott Rohde notifications@github.com wrote:

@dlebauer https://github.com/dlebauer To be clear, should all of the season 1 experiments should have the same start and end dates? Looking at all of the managements in https://terraref.ncsa.illinois.edu/bety/managements, the latest harvest date is 2016-07-14. There are planting dates 2016-04-19, 2016-04-20, and 2016-08-03. I assume the latter is for season 2. There are no planting dates in May. So should the duration be "2016-04-19 to 2016-07-14" for all season 1 experiments?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/97#issuecomment-272501300, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5xyxQmKu4VFquoEbfIeLG8pE92KBks5rR7jvgaJpZM4IXn6E .

gsrohde commented 7 years ago

@dlebauer I did season 1. I used insert statements of the form

INSERT INTO experiments_sites (experiment_id, site_id) SELECT e.id, s.id FROM experiments e, sites s WHERE e.name = 'BAP' AND sitename ~ '^MAC Field Scanner Season 1 Field Plot 705( [WE])?$';

I have two questions about season 2:

Site names for season 2 come in two forms: they either match the regular expression

'^MAC Field Scanner Season 2 Range \d+ Pass \d+$'

or match

'^MAC Field Scanner Field Plot \d+ Season 2$'

Should I associate the "Season 2 Stay-green RILs F10" experiment with any site that contains the string 'Season 2'?

Secondly, there is no management corresponding to the season 2 harvest, so I don't know what the precise end date should be.

dlebauer commented 7 years ago

Yes associate with all season 2 sites

End date is Dec 2 2016 On Fri, Jan 13, 2017 at 4:05 PM Scott Rohde notifications@github.com wrote:

@dlebauer https://github.com/dlebauer I did season 1. I used insert statements of the form

INSERT INTO experiments_sites (experiment_id, site_id) SELECT e.id, s.id FROM experiments e, sites s WHERE e.name = 'BAP' AND sitename ~ '^MAC Field Scanner Season 1 Field Plot 705( [WE])?$';

I have two questions about season 2:

Site names for season 2 come in two forms: they either match the regular expression

'^MAC Field Scanner Season 2 Range \d+ Pass \d+$'

or match

'^MAC Field Scanner Field Plot \d+ Season 2$'

Should I associate the "Season 2 Stay-green RILs F10" experiment with any site that contains the string 'Season 2'?

Secondly, there is no management corresponding to the season 2 harvest, so I don't know what the precise end date should be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/97#issuecomment-272560637, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5yI5J7_xOO0Gi1ahFwtQf8MpJrEvks5rR_UcgaJpZM4IXn6E .

gsrohde commented 7 years ago

@dlebauer OK, season 2 is now done. But it occurred to me that maybe id numbers for the experiments and experiments_sites are supposed to be in the 6 billion range. If so, I'm not sure what mechanism @robkooper uses to make sure each machine starts autonumbering at the correct place.

gsrohde commented 7 years ago

@dlebauer I re-did the experiments and experiments_sites inserts with the correct id range restriction in effect. Did a hotfix release to set the starting id numbers correctly.

gsrohde commented 7 years ago

@dlebauer Can this be closed?

terraref / computing-pipeline