Define a list of repository types and formats

colinpmillar commented 4 years ago

summary

formats

If repositories are going to link together, it would be very useful to know what type of data a repository creates. The idea is that this will be done by a repository registering to provide a set of outputs, which will come from a controlled vocabulary. For example, a stock assessment repository might offer several data outputs, for example:

SAG output
Catch options output
a fitted FLXSA object

But could minimally only offer SAG output. This way data outputs can be general, and specific, and it is up to the repository owner what outputs are. Some stock assessment models estimate catches at age, and this might be a useful dataset.

repository types

It is useful to be able to identify the main purpose of a repository, even though the main outputs of a repository are defined by the formats it is registered to produce. High level repository types are potentially:

Catch at age estimation
abundance index estimation
Stock assessment There are also possble sub-types, such as
acoustic index calculation
swept area index
forecast
multifleet stock assessment

Its not clear if it is useful or a hinderance to define vocabularies for subtypes.

task list

[ ] Draft a table of "contracts" such as SAG xml file
[ ] Draft table of repository types, e.g. assessment, forecast

related issues

jensr commented 4 years ago

Tags have now been added for different types of TAF repos replaced in #1

colinpmillar commented 4 years ago

add MSE as a type from @iagomosqueira

jensr commented 4 years ago

@colinpmillar will this be set up as an ICES Vocab that TAF draws from?

colinpmillar commented 4 years ago

Yes - that would be ideal - I would be happy to aim for that.

colinpmillar commented 3 years ago

I think this issue will spawn an issue for each project type, and it will be important to manage these and make them available for people to view and review.

Recently I had a day with the WGMIXFISH group who have quite a complex of TAF projects (see below). We tried at this meeting to develop a list of standard outputs, but this was deffered to a later MIXFISH meeting, perhaps having this added as a ToR

colinpmillar commented 3 years ago

Also in addition, the WKREPTAF workshop began developing a list of standard outputs for stock assessments. I think we should consider developing a form and perhaps a ToR for groups to that are the main governors of a TAF project Type. Examples that come to mind are:

WGACOUSTICGOV - for the acoustic survey indices. This was raised at a recent meeting, and has been deferred.
WGMIXFISH
WKREPTAF - for all stock assessment project types

Example for a category 1 stock assessment as proposed by WKREPTAF

Type	Output	Product	Optional?
Input	Catch time series (biomass), possibly by fleet, area, country	Tables and plots
Input	Age length key (commercial and survey)	Table	optional
Input	Biological data (weights-at-age (at-length), maturity, etc.) M if it is an input	Tables and plots
Input	Biological parameters (growth parameters, model for natural mortality, etc.)	Table	optional
input	Catch (landings, discards) numbers at age (at-length) time series, possibly by fleet (can be used for: bar/line plot, bubble plot etc)	Tables and plots
input	Index time series (catch rate, biomass or numbers at age) Can be used for standardised index at age bubble plot, internal/external consistency plots.	Tables and plots
Input	Spatially disaggregated raw survey data	Table and plot (maps)	optional
Input	Tagging data	Table	optional
Input	External drivers (e.g., temperature, chlorophyll)	Table	options
Assessment	F-at-age (at-length) or selectivity at age (at-length) over time, possibly by fleet	Tables and plots
Assessment	Estimated/predicted catch at age (at-length) over time, possibly by fleet	Tables and plots	optional
Assessment	Estimated/predicted survey / commercial index at age (at-length) over time	Tables and plots	optional
Assessment	Q at age(length) (over time)	Tables and plots
Assessment	Any other estimated quantities (e.g., SSB and recruitment) / parameters, including uncertainty and variance parameters, if available.	Tables
Assessment	Residuals of all estimated quantities / parameters	Plot
Assessment	Retrospective runs – summary table information is enough in most cases	Tables and plots
Forecast	Cohort contribution to forecast landings and ssb	Table	optional
Forecast	The values calculated as the short-term forecast inputs.	Table
Forecast	Forecast catch, ssb etc. for different levels of F. Uncertainty.	Table
Assumptions, model settings	Model settings, priors etc. – how to report? More for Stock annex.	Configuration files and/or tables	optional
Metadata	Explicit forecast assumptions
Metadata	Assessment category, model, inputs.

colinpmillar commented 3 years ago

One more point is a list of potential project types, or groups of types, a initial list, based on current (and potential) use cases is:

colinpmillar commented 1 year ago

this is the structure of a developing DB for storing outputs from a TAF repository.

Various formats for outputs are possible (outputs stored as files). and therefore varioua validation options are possible.

colinpmillar commented 1 year ago

But what we need here is a vocabulary of TAF output types. an output type could be linked to a format / structure, and also to a validation script and / or QC script function.

iagomosqueira commented 1 year ago

Could they be also (weakly) linked to some available github repository templates?

iagomosqueira commented 1 month ago

Should this be linked to the proposed WK on TAF templates?

ices-eg / wg_WGTAFGOV