terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Define Use Cases that end-user API to query data should support #9

Closed dlebauer closed 7 years ago

dlebauer commented 9 years ago

def: a few sentences that describe a task someone wants to accomplish. will help prioritize feature development and project organization, and get feedback.

ex1: someone is looking at an image in Clowder, have identified a particular trait, and want to find all plants with this trait (within some range or greater than, e.g. top 10% biomass) and the find other data assoc. w/ these plants

ex2: I have an interesting thing I’ve noticed, can I find all plants w/ same feature +/- X%

ex3: Want to upload data so someone else can get to it and its metadata

ex4: want to publish a collection from Clowder

references

max-zilla commented 9 years ago

Here are some examples, Not all of these will be valid but perhaps it will engender discussion... these examples might be scattershot across the various components.

Inherent to all of these use cases: "is this realistic? do we need to support this? does this capability already exist, through the stated path or through a different path?"

End-users working with existing data in Clowder

  1. Browse a gallery of daily images of the same plant across 2 weeks.
  2. Search for images by metadata fields (e.g. one particular sensor with a value exceeding some threshold), and download the results
  3. Search for images with ranges/fuzzy-matching (e.g. all images of plant with height +-X cm or X% of nominal value)
  4. Read about the specification of the sensors that gathered these data to understand what the value represents.

end-users with a bit more ambition...

  1. Show a histogram of some metadata field/trait values for all images in a Dataset (this one is getting out there...)
  2. Search for all images with some value outside 2 standard deviations of the average value for that day and field
  3. Load a selection of images from one of the use cases above into an analysis VM with PlantCV, etc available for study.
  4. Export some tabular structuring of selected metadata fields from a batch of images, e.g. a CSV with identifier, trait1, trait2, trait3, date

developer-oriented

  1. augment the Clowder image extractor to ingest a new data format that is proprietary/internal
  2. write SQL queries against BETYdb directly to bulk-upload data into e.g. the Traits table
  3. Stand up new instance of BETYdb for my own use
dlebauer commented 8 years ago
  1. Find all plots where [accession X] was planted
    1. Find all points and dates where seeds of [accession X] were planted
    2. find bounding boxes associated with these plots
    3. use bounding boxes to query data from [sensor X](e.g. raster image) from [date range](in Clowder)
  2. find all trait values (in BETYdb traits table, summarized to plant or plot level) associated with [accession X] collected using [method X]
  3. after sensor data have been processed geospatially orthorectified and aligned with overlap and artifacts removed, use bounding boxes to clip / select data from some set of sensors
  4. return experimental design 'plot plan' from PostGIS database (BETYdb)
  5. return information about accessions from BETYdb cultivars table and also in BMS,
  6. return information about lineage, seed packets, experimental design from BMS,
  7. Load field measurements collected in FieldBook APP into BMS
  8. Import traits from BMS to BETYdb and vice-versa The reason we are using both BETYdb and BMS despite substantial overlap is that BETYdb has better support for geospatial data, numeric traits, and large external raster files; BMS does a better job at tracking experimental design, lineage, genomics, and can (apparently now or in the near future) import directly from the FieldBook app.
dlebauer commented 8 years ago

@max-zilla assigning to you since you are working on the cross-database search

max-zilla commented 7 years ago

https://github.com/terraref/computing-pipeline/issues/231 Created a new issue to continue this discussion in context and linked here. Closing this issue to consolidate.