Closed ConnectedSystems closed 2 years ago
Hey @ConnectedSystems,
This sounds great :) What are the means and medians etc across? Is it across species and sites also, or just across simulations? (apologies if you've mentioned this somewhere else)
It's the average/median for all sites and species, for each simulation.
The unfinished ones are for finer detail (stats for individual sites/species, and the same but across time)
Ok cool, thanks. Just wanted to check there were options for what dimensions the statistics are calculated across
Closing as I think this is superseded by #59
As part of work underway to address #46 and #48 I've started writing some functions to produce metric values.
For context, results from large-scale simulations are saved in batches across multiple files. I have an example result set with 77,824 simulations in total, saved in batches of 500 (so the results are spread across 141 netCDFs).
The approach is to read each file, apply the desired metric calculation, and discard the rest. This is part is all working, just need to come up with an appropriate function name.
To indicate how one might use it (noting that the function name is not finalized):
It seems we're all currently working with individually written metric functions which are copy/pasted around the codebase. While there may be purpose-specific metrics which should be left alone, in general I think its time to standardize the metric functions in use.
I currently have:
meanCondition()
: calculates mean (and only the mean) for each simulationconditionStats()
: calculates summary stats (mean, median, min/max, stdev) as well as upper and lower 50/75/95 percentiles for each simulationI have plans for
conditionStats()
but for each site/speciesThese metrics are constrained to the TC/C/E/S results that are currently the primary outputs from ADRIA (which I believe will not change, at least for the foreseeable future).
Do we want anything else?