lizzieinvancouver / egret

2 stars 0 forks source link

what to do with storage temp and duration? #39

Open FrederikBaumgarten opened 4 months ago

FrederikBaumgarten commented 4 months ago

number of spp with a storage time: 148 number of spp with storage temp: 133 number of spp with both: 66

This is probably the best way (and only quantitative way) to go forward with storage as a potential driver for germination?

lizzieinvancouver commented 4 months ago

See issue #14

lizzieinvancouver commented 3 months ago

@kengi-neer Would you be willing to take a look at the storage column and think on how we might want to make new columns from it so it is more useful? Ideally we might dream of three columns: duration, temperature, and moisture for storage but I doubt we have enough information from most papers to fill this in and a bunch of columns with NA is not super helpful for analysis. If you can take a look and see what you think, I'd appreciate it.

kengi-neer commented 2 months ago

@lizzieinvancouver and I discussed to integrate the current storage and chilling columns to make three new columns (names pending) using the current system used for chilling ('then' to signify order):

Also, from @DeirdreLoughnan, we could consider 'cold stratification' to be wet chilling.

If we plan on using scarification data to decide when chilling units are calculated, we could add one more column of integers to specify which stage / temperature condition (based on 'then' dividers) was scarification applied.

lizzieinvancouver commented 2 months ago

@kengi-neer This sounds good to me. @DeirdreLoughnan do you see any issues or concerns? I agree we can consider 'cold stratification' to be wet.

Thanks for working on this.

kengi-neer commented 2 months ago

@lizzieinvancouver I have finished cleaning weird values for storage temperature and duration, except for some missing papers (aldridge1992, basaran12, bibby53, li17) and updated the script to combine the storage and chill conditions which should work after updating the cleaning with the missing papers. I'll update my subsetting conditions sometime after (since sometimes not using which() may lead to subsetting some NA rows which I guess may be deleted rows still in memory but I can't be sure), but it currently works fine.

I changed the new column names to be dormancyTemp, dormancyDuration, and dormancyWet, with wetness during storage conditions being based on storageType instead and if it has the word 'moist' in it.

There seems to be a lot of estimation in terms of storage durations though, as well as some short-duration conditions and treatments (e.g. heat treatments) that may be overlooked with our current scraping. I would suggest having someone to try and do a whole "rescraping" using our current cleaned data as basis, also to check if there are any mismatched columns (a lot of study IDs, figures, less frequently temperature conditions).

EDIT: I only cleaned the weird values, not neatly checked each condition such as those mentioned in issue #44. Also, even though we might not be interested in the small duration temperature conditions, I still think it's a good idea to incorporate this "rescraping" approach as we would still need to recheck if the newly stitched storage and chill columns are fine. There are also some seemingly normal values but need some cleaning, such as chilling duration in weeks than days which I only found by chance.

lizzieinvancouver commented 2 months ago

@kengi-neer Sounds good! Thank you for all your work on this. I will try to check this soon (next week most likely), but if @dbuona or @DeirdreLoughnan have time sooner they could take a first look (more eyes always better).

lizzieinvancouver commented 1 month ago

Semi-related to this, I need a volunteer to check cleanStorage.R -- especially the addition of this very not-useful column:

> table(d$storagetemp)

  0 
283 
lizzieinvancouver commented 1 month ago

Thanks to @DeirdreLoughnan for checking cleanStorage.R fully!

DeirdreLoughnan commented 21 hours ago

@kengi-neer it was great chatting with you about the combineStorageChill.R code. In reviewing this issue again, I think it best for us to continue discussing the issue here rather than create a new issue, but consolidate key points that are in issue #44.

In summary the storageTemp and storageDuration columns are cleaned, but we still have some work to do to create columns for when storage conditions are cold (between -20 and 10C, see comment in #44 from Aug 2) and "moist".

We discussed:

Thank for your help with this!