scottcha / OpenAvalancheProject

Open source project to bring data and ml to avalanche forecasting
MIT License
82 stars 38 forks source link

Extend new snow model to overlap with dataset and integrate it in a trial. #78

Open scottcha opened 10 months ago

scottcha commented 10 months ago

Overview: currently the GFS model which is used as inputs has a few datapoints which cover snow specific data: such as snowdepth and SWE as well as a couple of accumulation variables (accumulated precipitation and convective accumulated precipitation). I'm pretty sure that the snowdepth and SWE values have some accuracy problems (though it could just be related to the high variability of these values over the 12km grid used). Regardless of the issues there is an opportunity to augment the data with more accurate snow data since that is the most important data we are using. This issue is covering the work to develop a seperate snow model trained on the model called snowdas and https://nsidc.org/data/g02158/versions/1 and use that as input features to the avalanche forecast model. The second benefit of integrating a model based on snodas is that snodas is provided on a 1km grid and provides the ability to further subdivide the 12km grids on gfs as well as provide more relevant elevation and aspect information as on the smaller grid these values are more meaningful for our purposes.

Design of the Snow Model

Inputs/Features:

ERA5 https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5:

The climate modeling dataset ERA5 has a higher degree of accuracy than GFS forecasts currently used so this is an opportunity to also integrate improved data from that model. ERA5 has a multi-day latency which makes it prolematic for producing day+1 forecasts so currently we'll use it for the historical daterage and fill the remaining data with GFS.

GFS

to fill the gap between the latest ERA5 data and the forecast date we'll use GFS. The variables in ERA5 have some overlap with GFS but use different names and would need to be mapped or we might just try and use additional features and not map them with ERA5 variables set to mean or 0 where its missing.

Elevation/Aspect:

Each 1km grid will also provide an additional feature for average elevation and average aspect. Dataset is ASTER elevation. https://asterweb.jpl.nasa.gov/GDEM.asp

Snodas

We won't use snodas as an feature inputs as we don't want to create an ongoing dependency on this but rather create something which is independent of it.

Model

Initial model will be similar to the current OAP prediction model and be based on the tsai library where we decompose the data in to a multivariate timeseries over n timesteps but where we create a regression model. The targets will be the variables on the day where we want to provide the prediction. There will be one model created for each variable.

Important Details

While the ERA5 data goes back over a decade the GFS data doesn't (at least at the 12km grid) so we'll start this model in 2016. It will need to be determined whether its useful to include data outside of the winter months to improve the model accuracy.
The development should very early identify how the output of this model is integrated with the avalanche forecasting model so we keep the work consistent but it might be useful just to have an independent snowpack model available as well.

scottcha commented 5 months ago

Current development for the snowmodel is here: https://github.com/scottcha/OpenAvalancheProject/blob/master/Data/SnowModel.ipynb

The preprocessing notebook which handles both ERA5 preprocessing as well as Elevation/Aspect preprocessing is here: https://github.com/scottcha/OpenAvalancheProject/blob/master/Data/SNODASToZarr.ipynb