Closed sdtaylor closed 7 years ago
Dates will become a huge hassle in this. Forecasts can potentially be made on different dates prior to sampling. Also the exact date of sampling won't be known till afterwards. So automatically lining them up will be a challenge. It might be best to have the period number as the primary form of time keeping.
New moon numbers are the proposed solutions to dealing with dates. That would make the current headers required for all forecast outputs as follows.
date, newMoonNumber, model, treatment, species, estimate, LowerPI, UpperPI
all of this is being taken care of in other places
Before diving into this it would be good to think about what exactly we want out of it, and what that will require.
Key Features
Needed
A common interface that models can be run under, and produce the same output. For example, we should be able write a primary script such as this.
Ideally new models could then be added by updating a single line in a config file. They would automatically be incorporated into everything downstream, such as graphs being made and being included in any rankings.
A common format for forecasts. ie. a csv file with headers formodel_name, forecast_start_date, date, species, plot, estimate, ci_upper, ci_lower, ….
.A script that collects the most recent portal data, iterates thru running the models, and produces a forecast from each.A script that compiles forecast data and produces graphs and verification metrics for the site.
Questions
Once a model is added, should any updates to it be allowed? Obviously time series models should be re-fit up to the most recent timestep. But adding new variables or changing a model in a major way means all of its past forecasts can no longer be compared. Major updates could be labelled as new model versions, such as GAMv2, GAMv3, etc.
Should we allow outside groups to submit models?