terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

A protocol for checking that scans align with plots #153

Open dlebauer opened 8 years ago

dlebauer commented 8 years ago

Description

We need a protocol to ensure that each scan works and is consistent with the intended mission.

For each sensor, we should generate a set of map layers, one for each sensor, that overly summary images on the experimental layout.

Context

Field scanner operator needs sanity checks.

Further Suggestions / Request for Feedback

Here is a proposed SOP - a starting point for feedback:

  1. Layout the field using some GIS software
    • this requires georeferenced coordinates see terraref/reference-data#32 waiting on @smarshall-bmr and @rjstrand
    • ideally Mike Ottman or Pedro could provide plot layouts for the experimental design one week in advance
  2. Load experimental plots into BETYdb, Ranges into Clowder Sensor / PostGIS database
  3. Compute the range centers from ranges in Clowder
  4. Program the scanner missions using these range centers / plot dimensions
  5. After planting, upload the precision planter output to Clowder
  6. Insert seed locations into the Clowder Sensor / PostGIS database
  7. Confirm consistency between seeds planting locations and experimental plots and ranges
  8. If seed positions inconsistent with plot layout then adjust the experimental plot layouts and go to 2
  9. Each day create a full field mosaic summarizing each sensor output something like 1cm resolution
    • Hyperspectral could be one band or an index like NDVI, 3D sensors could be plotted as a raster elevation map, etc.
    • Overlay map of plot boundaries and cultivar names.
  10. Each day field scanner operator checks last day's scan with plot posistions overlain
JeffWhiteAZ commented 8 years ago

What you've outlined is a good start for making sure scripts are properly defined each season. I would prefer a separate, more detailed protocol for each scan, which might be daily or require a few days:

  1. Define scan objective
  2. Prepare scan script or select from existing script
  3. Log planned scan
  4. Start scan
  5. (monitor scan progress)
  6. At end of scan, check start and end scans for data completeness, sensible values
  7. Log scan completion
  8. Within 24 h (48 h?), generate mosaic of scanned area to check for reasonable values. This should be both visual and with histograms of values. See https://github.com/terraref/computing-pipeline/issues/84
  9. Scan operator(s) certify data as usable by logging status as checked
rickw-ward commented 8 years ago

Some background: Our use of machine planting (versus hand planting) renders our plans for placement of experimental units (Plots) “aspirational”, in the sense that the precise start and end point of each line of seed within a plot is affected by multiple stochastic variables. “Precision” planting as we are practicing it is a relative concept, especially when compared to common practice in laboratory settings.

For instance, in lab settings, both the absolute and relative position of wells within a 96 well plate are effectively invariant. But only the relative position of experimental units in a grid of machine planted field plots remains constant.

The absolute position of a field plot is a result of multiple aspirational variables, each influenced by stochastic errors. The position of the experimental unit in our field plantings is where the seed assigned and planted to that unit ultimately grow.

Geo-information from the planter or from pre-plant plans provide valuable information for resolving questions that arise especially in situations of planting errors, but they are informational and not determinative. The how and why the seeds of a plot are in a position other than planned help us refine our precision in future plantings, but do not necessarily inform the current planting.

Bottom line: Machine planted field plot units are where they emerge, and geo-localizing a given unit is an empirical exercise. They will not be in the same place the next time we plant.

I see the @dlebauer SOP above as a component of the dial-in process required with each new "crop season" (i.e., whole of gantry planting date).
Basically, we need to geolocalize the corners of each experimental unit based on where it actually is upon emergence. That enables creations of polygons for associating data with each experimental unit (and therefore each instance of a treatment, usually a germplasm, but potentially combinations of germplasm and 1 or more additional treatment factors).

I see @JeffWhiteAZ comment as relating to a work-package for the execution of a specific gantry data collection activity.

I sense there is an additional design-of-experiment layer that needs revisiting. The Campaign and Mission paradigm that Jeff and others worked on needs to be embraced at the level of the overall season, and the level of goals/purposes within a season.
Those of us who have careers in field experiments often count time in units of planting seasons. Combined, in the Arizona terra team, we might have experience designing and executing over 80 planting seasons. Personally, each of those seasons has had the intensity of something like a Space Shuttle launch. Lessons learned are incorporated constantly, but each season/launch stands or falls on its own. Did we pay sufficient attention to the fact that we were launching another season? What are the teams that are ensuring we maximize the value of the RIL population currently planted? Do we all know what a RIL is and the history of this population?

I think we need to systematically adopt the Campaign/Mission paradigm, and ensure that our intra-project lines of responsibility/communication support the resulting work packages. At the risk of redundancy, I think we need to seriously embrace ensuing that we have the following in place at the season, campaign, and mission levels:

1) Team members and responsibilities a. Modalities of the communications that are needed from planning through execution and closure. 2) Purpose 3) Geo-spatial and statistical design 4) Details of the nature of data that are needed to generate information that enables achievement of the purpose 5) Activities that generate and store the needed data/metadata (including quality/precision parameters), a. Modalities of data collection (modalities=sensors mounted on humans, tractors, UAVs, and the gantry), b. Monitoring and quality assurance in terms of alignment of observations with experimental units, repeatability, accuracy, etc. i. Monitoring and QA require continuous cooperation between staff who collect data (observations of plants plus metadata on conditions), and staff who assess data in the context of the value of the data as sources of information that advance the experiment’s purpose. 6) Milestones/checkpoints

doctorroboto commented 8 years ago

@dlebauer : this overall SOP document is sound, and I thought we were already following this format. I would advocate for this path forward, both to help organize based on Rick's comments re: 'precision' (not necessarily accurate) planting, and also to use as a tool for checking data and sharing visual information inside and outside of TERRA-REF. We need to wrap up this RTK-GPS plan, get the closest approximation to an absolute coordinate system as we can, and use this to quickly generate the GIS shape files. I would further ask that we put this system in place before the annual meeting, as GIS-based maps are very nice tools to convey information, and represent a common language within our TERRA family. They also tend to generate follow-on funds these days.

All: even in the absense of the GIS means of exploring and aligning datasets, we can still come up with an interim process that works. Stuart's plan is a great start, basic review of the data. I like the complementing hyperspectral band selection (NDVI, perhaps) to help get a quick sniff check, and believe that a histogram should help as well. Let's be certain to capture what we hope to ask ourselves during daily / weekly checks on system accuracy and unexpected / expected trends, and ensure that our process of data review is answering these questions. Is a histogram going to expose ungerminated areas (that turned out to be alleyways), or is this going to get washed out in the noise? Is a visual GIS-based map going to do it? What about a comparison of expected vs recorded values for each plot, displayed in a table or image? This might help determine early indicators of failing crop health, misalignment, sensor inaccuracy, etc. This 'review pipeline' seems like a great way to shape the experiments, and would draw from expertise within the team to ensure experiment utility. I like the use of punch lists: here's a list of n questions that we want to know after each scan - what's the fastest way to answer those questions?

@rickw-ward : very interesting. I didn't realize the extent to which we had to go back and get the absolute positions of the plants after planting, although I do recall the drift in the first planting we had. I'm behind on emails, but hopefully we're making progress on terraref/reference-data#32 so we can get to this point soon.

ghost commented 7 years ago

@dlebauer - is this a priority for the V0 release?

dlebauer commented 7 years ago

@ Yes this is a priority - this represents a very basic sanity check to confirm that our geospatial components are working.

rickw-ward commented 7 years ago

reading..