Required metrics for 28 Feb Report (#53)

ConnectedSystems commented 2 years ago

As part of work underway to address #46 and #48 I've started writing some functions to produce metric values.

For context, results from large-scale simulations are saved in batches across multiple files. I have an example result set with 77,824 simulations in total, saved in batches of 500 (so the results are spread across 141 netCDFs).

The approach is to read each file, apply the desired metric calculation, and discard the rest. This is part is all working, just need to come up with an appropriate function name.

To indicate how one might use it (noting that the function name is not finalized):

% ...after running some simulations which saves results to file...

file_names = ...  % get list of result files from some location

Y = collateDistributedResults(file_names, @some_metric_function)

It seems we're all currently working with individually written metric functions which are copy/pasted around the codebase. While there may be purpose-specific metrics which should be left alone, in general I think its time to standardize the metric functions in use.

I currently have:

meanCondition() : calculates mean (and only the mean) for each simulation
conditionStats() : calculates summary stats (mean, median, min/max, stdev) as well as upper and lower 50/75/95 percentiles for each simulation

I have plans for

conditionStats() but for each site/species
as above, but across simulation time.

These metrics are constrained to the TC/C/E/S results that are currently the primary outputs from ADRIA (which I believe will not change, at least for the foreseeable future).

Do we want anything else?

Rosejoycrocker commented 2 years ago

Hey @ConnectedSystems,

This sounds great :) What are the means and medians etc across? Is it across species and sites also, or just across simulations? (apologies if you've mentioned this somewhere else)

ConnectedSystems commented 2 years ago

It's the average/median for all sites and species, for each simulation.

The unfinished ones are for finer detail (stats for individual sites/species, and the same but across time)

Rosejoycrocker commented 2 years ago

Ok cool, thanks. Just wanted to check there were options for what dimensions the statistics are calculated across

ConnectedSystems commented 2 years ago

Closing as I think this is superseded by #59

open-AIMS / ADRIA_matlab

Required metrics for 28 Feb Report (#53) #54