Standard naming scheme for plots

dlebauer commented 8 years ago

Following terraref/computing-pipeline#187

Background

We need a standard way of defining plots.

there can be multiple concepts of plots -
- the full plot as the unit of replication in an experimental design
- the center of the plot used for image analysis and hand measurement
- individual rows within the plot (e.g. in some cases hand measurements are separated into E and W rows)
In addition we can create one plot per plant.

How these will be used and grouped:

geospatial queries: Once the 'full' plot is identifies we can find all of the other plots that it contains.
experiments: an experiment has a name, start and end dates, and is linked to one or more plots
- sitegroups: these allow arbitrary groupings of plots

for reference, here is the schema for the sites and related table

First use case, the Maricopa Field

From @NewcombMaria

A 'Range' extends in the Y direction (East-West) and
- in sorghum crop1 and sorghum crop2 there have been 54 ranges.
A 'Row' extends in the X direction (North-South)
- in sorghum crop1 and sorghum crop2 there have been 32 rows.
A 'Column' consists of two rows
- With sorghum crop1 it was useful to use the term 'Column' to represent 2-rows, since crop1 had 2-row plots and also 2-row border plantings on the West and East sides (16 'columns' was a useful concept).
the term 'Pass' was used for sorghum crop2,
- One 'Pass' is equivalent to two columns or 4 rows

Both crops had 2 row border plantings on the East and West sides of the field

Sorghum crop one had 2-row plots (14 experimental plots per range)
Sorghum crop2 has 4-row plots (7 experimental plots per range)
Suggested Approach:

Use a plot naming system for plots where the following would be valid:

<sitename> <year> Season <season> Range <range #> Column <col #>
- assuming 'column' is a useful concept
<sitename> <year> Season <season> Range <range #> Pass <pass #>
<sitename> <year> Season <season> Plot <plot #>
<sitename> <year> Season <season> Plant <plant #>
TODO
[ ] @NewcombMaria review my summary of the definitions and experimental design, modify and add to the (field protocols google doc)
[ ] @robkooper, @gsrohde and anyone else please provide feedback.
- year and season information can be stored in experiments table, but we need a way of constructing unique site names
- Should we use sitegroups or will experiments suffice?

tagging others: @rickw-ward @tingli3 @Mamatemenrs @ZongyangLi @yanliu-chn @craig-willis

ZongyangLi commented 8 years ago

A 'Range' extends in the Y direction (East-West)

Is this 'Range' means in the North-South direction?

dlebauer commented 8 years ago

@ZongyangLi in the gantry coordinate system, the Y direction is East West and the X direction is North South. This is not intuitive and has been the source of great confusion. See https://github.com/terraref/computing-pipeline/issues/174#issuecomment-250893581

ZongyangLi commented 8 years ago

@dlebauer Yes I know. So to my understanding, this 54 Ranges means X direction in gantry system and it's in the North-South direction in a real world coordinate system.

dlebauer commented 8 years ago

@ZongyangLi sorry - I think I was confused because the range numbers increase going north but each one is oriented E-W.

rickw-ward commented 8 years ago

I yield to @NewcombMaria on terminology. As I use it, Range 1 is in the south, range 54 at the north. Ranges are adjoining plots between two alley walk ways (which go east/west, i.e. in the Y vector of the gantry).

NewcombMaria commented 8 years ago

@dlebauer, David, if it's helpful Jeff White and I came up with a definition for 'field plot'. For the Maricopa field, it's useful to consider that there are experimental plots as you mentioned 'the unit of replication in an experimental design', and also border plots which could be useful for observational data. Plot numbering will be different for each planting.

Field plot: the unique field location (area) assigned to an individual treatment and replicate. Each plot has an unique identifier within the field. In some field layouts, there may be border or filler plots in addition to experimental plots and these border plots may differ in size/shape from experimental plots.

dlebauer commented 8 years ago

@gsrohde could you please do the following:

[x] Some sitenames say MAC Field Scanner while others use Field Scanner; could you please use some regex or other magic to append MAC to all sites that say Field Scanner
[x] For sites that start MAC Field Scanner but do not include a Season, please add Season 1 after MAC Field Scanner.
[x] Where names say ... Season 2 Row <n> Col <m> Please change to Range <n> Pass <m>

gsrohde commented 8 years ago

@dlebauer I've made all of these changes. Note, however, that 'Season 2' appears last in site names that have a Field Plot number whereas it appears directly after 'MAC Field Scanner' in names that have a Range and Pass number.

ghost commented 7 years ago

@dlebauer can this be closed?

NewcombMaria commented 7 years ago

@dlebauer this is a good time to establishe the preferred sitename format for BETYdb, and preferred plot naming scheme because we are currently putting together the spreadsheet for the next planting (durum wheat). 3 questions to be answered before the start of this next winter crop: 1) Considering the options below, is there a preference for either 1) a Plot location (for example the first two options in the list), or for 2) a Plot number (third option on the list)?

Season Range Column assuming 'column' is a useful concept Season Range Pass Season Plot Season Plant 2) For , is this the year of planting, or year of majority of data collection? This next planting will be December 2016, but emergence and start of data collection will most likely be in January 2017. What is the standard for when it crosses over 2 years during the winter? 3) What is the preferred way to handle subplots in BETYdb going forward? We discussed 'entity' for rows within plots - is that preferred? The last sorghum crop was planted in 4-row plots. The next wheat planting will be planted in 2-row plots (2 subplot rows, E and W). The sitenames could be individual rows, or the sitename could be the entire 2-row plot. Which is preferred?

dlebauer commented 7 years ago

@rmgarnett

rmgarnett commented 7 years ago

Ah, so what I was trying to explain earlier today, there are 32 rows, that are currently logically arranged as

[2 rows of border] ([4 rows of same genotype] x 7) [2 rows of border]

and we are looking at the interior two rows of each of the 7 "4-row" plots. Numbering them as 16 two-row plots results in the following division:

[2 rows of border] ([2 rows of same genotype as ->, only care about right] [2 rows of same genotype as <- only care about left] x7) [2 rows of border]

Which is very odd. The 2-row plot division seems to accomplish nothing useful. Seems like any of these would make more sense:

simply having 32 rows, giving 54 * 32 plots
having variable-sized plots, 2-row border plots and 4-row interior plots
having a separate view of the data such that the interior rows of the 4-row plots were available:

(1 2) (3) (4 5) (6) (7) (8 9) ...

So I can easily pull out the plot corresponding to (4 5), which is currently impossible.

Personally, I think simply numbering by row rather than arbitrary and confusing 2-row plots would allow any later change to the planting scheme to be easily dealt with.

NewcombMaria commented 7 years ago

Thanks Roman for your description of the Sorghum Season 2 layout, with 2-row borders and 4-row 'plots'. I tried to illustrate what you described in a powerpoint slide. There's good reason to break the polygons and numbering scheme down to the smallest unit of row, with 32 rows across, as you suggest. The term 'plot' to most people will be associated with an experimental unit, which can be multiple rows or single rows, but the numbered units should probably be the rows and these can be combined as necessary in different seasons and different planting designs.

rmgarnett commented 7 years ago

Fantastic picture, thanks!

ghost commented 7 years ago

from Nadia:

Plots can be defined as the field area that is occupied by a single genotype in a given rep. We can do 1 row, 2 row, 3 row, 4 row plots (whatever we want) depending on how many rows of one genotype we want in a particular area. This will change from experiment to experiment.

It is best to keep the data separated by row, but we need to have the ability to combine row information in order to generate plot-level data.

ghost commented 7 years ago

See https://github.com/terraref/reference-data/issues/114 for more discussion

terraref / reference-data

Standard naming scheme for plots #60

Background

First use case, the Maricopa Field

Suggested Approach:

TODO