The environmental forecasting pipeline changes some of the scripts significantly, so am using this as an opportunity to rationalise the scripts so that they're a bit more user friendly:
Prepare data
get the data assets output
[x] Prepare datasets from configuration for training prep_training_data.sh
[x] Prepare dataset from CLI date for prediction (e.g. for daily forecasting) prep_prediction_data.sh
Run model
get the model / prediction assets to the results folder
[x] Train a single model run_training.sh
[x] Train an ensemble run_train_ensemble.sh
[x] Predict via a single model run_prediction.sh
[x] Predict via an ensemble run_predict_ensemble.sh
Other changes
[x] Rename relevant post processing scripts to process_...<dot>sh or plot_...<dot>sh
[x] Rename dataset utilities to dataset_...<dot>sh
Note much of this is in place already, but there is a clearer delineation between data and ML operators, as well as other processing type activities. This should make documenting and navigating the repository easier.
The environmental forecasting pipeline changes some of the scripts significantly, so am using this as an opportunity to rationalise the scripts so that they're a bit more user friendly:
prep_training_data.sh
prep_prediction_data.sh
run_training.sh
run_train_ensemble.sh
run_prediction.sh
run_predict_ensemble.sh
process_...<dot>sh
orplot_...<dot>sh
dataset_...<dot>sh
Note much of this is in place already, but there is a clearer delineation between data and ML operators, as well as other processing type activities. This should make documenting and navigating the repository easier.
Thoughts @bnubald?