Code for reproducing the results in Hwang et al. Improving Subseasonal Forecasting in the Western US with Machine Learning. Please execute all instructions, scripts, and notebooks from the base directory of the repository, i.e., the directory in which README.md is located.
The code was tested using Python 2.7 on Linux and macOS, and Anaconda 2.3.0. It makes use of the following Python 2.7 packages:
installed via the commands
conda install --channel https://conda.anaconda.org/conda-forge pygrib
conda install netCDF4
conda install jpeg
conda install pandas
conda install jupyter
conda install scipy
pip install https://github.com/jcrudy/py-earth/archive/master.zip
conda install -c r r
conda install -c conda-forge cdo
conda install -c conda-forge hdf5=1.8.18
conda install -c conda-forge pytables
After cloning the repository, please execute the following steps in preparation for generating forecasts.
gt_id
set to equal to the ground-truth identifier and target_horizon
set equal to the target horizon.To generate MultiLLR forecasts for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”}, a target horizon in {“34w”, “56w”}, and all target dates, execute the Jupyter notebook batch_2011-2018_backward_stepwise.ipynb with gt_id
set to equal to the ground-truth identifier and target_horizon
set equal to the target horizon. This notebook, for each target date in 2011-2018, generates MultiLLR forecasts for the target date using the script 2011-2018_backward_stepwise.py. Since each target date job is long-running, we recommend submitting these jobs to a cluster by setting run_locally
to False
and setting batch_script
to your personal batch cluster submission script. Alternatively, you can run the jobs locally and sequentially by setting run_locally
to True
(in which case the setting of batch_script
is irrelevant).
To generate the AutoKNN forecasts for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”} and a target horizon in {“34w”, “56w”},
gt_id
set equal to the ground-truth identifier and target_horizon
set equal to the target horizon. This will compute and save the similarities between every pair of dates in the dataset.gt_id
set equal to the ground-truth identifier and target_horizon
set equal to the target horizon. This will compute and save the predictions of the most similar viable neighbors of each target date in the dataset. gt_id
set equal to the ground-truth identifier and target_horizon
set equal to the target horizon. This will carry out the AutoKNN weighted local least-squares regression onto the top nearest neighbor predictions, an intercept, and fixed lagged measurements and save forecasts for all 2011-2018 target dates.To recreate the debiased CFSv2 skills for 2011-2018, run gen_cfsv2_skills_2011-2018.py.
To generate ensemble forecasts based on the predictions of MultiLLR, AutoKNN, and reconstructed debiased CFSv2, for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”} and a target horizon in {“34w”, “56w”}, execute the Jupyter notebook ensemble_backward_stepwise_and_knn_regression.ipynb with gt_id
set equal to the ground-truth identifier and target_horizon
set equal to the target horizon.
After completing all of the previous steps, executing the scripts table_skills_contest_year_all_methods.ipynb and table_skills_by_year_all_methods.ipynb will generate LaTeX tables corresponding to Tables 1 and 2 in the paper.