solgenomics / sgn

The code behind the Sol Genomics Network, Cassavabase and other Breedbase websites
https://solgenomics.net
MIT License
67 stars 35 forks source link

Phenotype Download returns oldest observation if there are repeat measures #4419

Open timparsons opened 1 year ago

timparsons commented 1 year ago

Expected Behavior

If there are repeat measures for a specific observation unit/trait combo, then the phenotype download will return the oldest observation as determined by the collection date of the observation stored in the database. This is inconsistent with a user's expectation that the latest observation be returned in the file.

This overlaps with issue https://github.com/solgenomics/sgn/issues/3630

Steps to Reproduce

  1. Collect observations with timestamps
  2. Upload those observations to a trial
  3. Collect new (and different) observations for the same traits/observation units with different timestamps
  4. Upload the second round of observations
  5. Download the phenotype data for the trial
  6. Verify that the value of the observation unit/trait combo is equal to the first observation uploaded
lukasmueller commented 5 months ago

Repeat measurements can only be solved if we store additional metadata with each variable. Proposal: a new variable property, repeat_type, defines how the variable behaves for multiple measurements. Possible repeat_types could be single, multiple, and time_series. When single is associated with a trait, it will overwrite the last trait value. If multiple is associated with the trait, values will continue to be added, including time stamps. Averages can still be made over all the values, as they all represent the same phenotype. For time_series, multiple measurements are recorded, but the resulting data cannot be averaged, so on the trial detail page no average will be shown for them. It can be displayed as a growth curve, or used to derive growth parameters, etc.

Variables with multiple and time_series associations, when downloaded in spreadsheet format, could be stored in one cell separated by a delimiter, such as |. In the extended format, the measurement timestamp is given in parenthesis, so an example cell could contain 56(2024-01-20 12:00)|68(2024-01-21)|63(2024-01-27)

For multiple type variables, downloads could also average these numbers upon request for easier downstream analysis

hkmanching commented 1 week ago

I wanted to follow-up on this issue and see where things stand with being able to download data that has multiple values from the database? I can see the values under raw data within a trial, but download still includes only the most recent value.

lukasmueller commented 1 week ago

Working on it. Should be available in a few weeks

Get Outlook for iOShttps://aka.ms/o0ukef


From: Heather Manching @.> Sent: Monday, July 1, 2024 4:34:32 PM To: solgenomics/sgn @.> Cc: Lukas A. Mueller @.>; Assign @.> Subject: Re: [solgenomics/sgn] Phenotype Download returns oldest observation if there are repeat measures (Issue #4419)

I wanted to follow-up on this issue and see where things stand with being able to download data that has multiple values from the database? I can see the values under raw data within a trial, but download still includes only the most recent value.

— Reply to this email directly, view it on GitHubhttps://github.com/solgenomics/sgn/issues/4419#issuecomment-2201708502, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAAV7F4XBZEE5QU5H2XQHMTZKIGTRAVCNFSM6AAAAAAUPJLPGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBRG4YDQNJQGI. You are receiving this because you were assigned.Message ID: @.***>