nmaffe / iceboost

A gradient-boosted tree framework to model the ice thickness of the World's glaciers
1 stars 0 forks source link
catboost glacier-modelling gradient-boosting-regressor xgboost
Logo

DOI

[PAPER PREPRINT] [PDF]✍️

A Gradient-Boosted Tree framework to model the ice thickness of the World's glaciers IceBoost GIF


Prepare the model inputs

1. Setup OGGM

Install OGGM. ICEBOOST uses OGGM's glacier geometries, v62. They are a slight revision-improvement of the official, RGI v6 repository, with some additional glaciers added as well. Once installed, specify its location in the config/config.yaml file, under the oggm_dir argument.

2. Tandem-X EDEM tiles

The model needs a Digital Elevation Model. We use Tandem-X EDEM. Ensure that you have enough storage space (~600.0 GB). To automatically download all but only the necessary tiles that contain glaciers,

Once you have all .txt files, setup a DLR account at the EOWEB Geoportal.

Now, create a structure of empty folders like the following. The path of the root folder Tandem-X-EDEM should be specified in the config.yaml file, under the tandemx_dir argument. The directory structure should be organized with 19 subdirectories as follows:

Tandem-X-EDEM/
├── RGI_01/
├── RGI_02/
├── RGI_03/
├── ...
└── RGI_19/

Place the txt files inside the respective folders.

xargs -a TDM30_EDEM-url-list.txt -L1 curl -O -u 'usr:pass'

Repeat for all 19 txt files. Great. You should have all zip tiles in all folders. Now the last step is unpacking them.

3. Prepare ERA-5 temperature

The model needs a temperature field over all glaciers. We use t2m from ERA5-Land and ERA5 merged together and averaged over 2000-2010. Download these 2 products from the Copernicus Climate Change Service C3S Climate Date Store from 2000 to 2010 at monthly resolution.

In the code, set save=True to save the generated era5land_era5.nc temperature field.

4. Prepare the ice velocity products

ICEBOOST uses surface ice velocity from Millan et al. (2022), Joughin et al. 2016 (Greenland, prod. NSIDC-0670), and Mouginot et al. 2019 (Antarctica, prod. NSIDC-0754).

Setup a directory structure like the following, download tiles and place them in the respective folders.

Note: place Millan et al. (2022) tiles in RGI 1-2 and 13-14-15 together.

From NSIDC download the greenland_vel_mosaic250_vx_v1.tif and greenland_vel_mosaic250_vy_v1.tif files and place them in Greenland_NSIDC/velocity/.

From NSIDC download the antarctic_ice_vel_phase_map_v01.nc file and place it in Antarctica_NSIDC/velocity/NSIDC-0754/.

Specify the location of these folders in the config/config.yaml file, under the arguments: millan_velocity_dir, NSIDC_velocity_Greenland_dir, and NSIDC_velocity_Antarctica_dir.

5. Prepare the world's coastlines product

ICEBOOST uses a global coastline product, obtained from the Global Self-consistent, Hierarchical, High-resolution Geography Database, Version 2.3.7. Download the shoreline polygons product at 'f' (full) resolution from here. We only care about land-and-ocean boundaries, therefore we only need the following two files:

Merge these two datasets to generate a final dataset of global coastline product. You can use the following code snippet:

import pandas as pd
import geopandas as gpd

gdf1 = gpd.read_file('/YOUR_IN_PATH/GSHHS_f_L1.shp', engine='pyogrio')
gdf6 = gpd.read_file('/YOUR_IN_PATH/GSHHS_f_L6.shp', engine='pyogrio')
gdf16 = pd.concat([gdf1, gdf6], ignore_index=True)
gdf16.to_file('/YOUR_OUT_PATH/GSHHS_f_L1_L6.shp', driver='ESRI Shapefile')

Place the generated GSHHS_f_L1_L6.shp file in a folder specified in config/config.yaml file, under the argument coastlines_gshhg_dir/.

6. Prepare the RACMO surface mass balance product

Over Greenland and Antarctica, ICEBOOST uses RACMO mass balance.

Greenland:

  1. Download the SMB_rec_RACMO2.3p2_1km_1961-1990.nc file.
  2. Run python create_racmo_greenland.py, with save=True to generate the final 1961-1990 averaged mass balance product: smb_greenland_mean_1961_1990_RACMO23p2_gf.nc

Antarctica:

  1. Download the smb_rec.1979-2021.RACMO2.3p2_ANT27_ERA5-3h.AIS.2km.YY.nc file.
  2. Run python create_racmo_antarctica.py, with save=True to generate the final 1979-2021 averaged mass balance product: smb_antarctica_mean_1979_2021_RACMO23p2_gf.nc

Setup the following folder structure and place the generated files in the relevant folders:

racmo/
├── antarctica_racmo2.3p2/smb_antarctica_mean_1979_2021_RACMO23p2_gf.nc
└── greenland_racmo2.3p2/smb_greenland_mean_1961_1990_RACMO23p2_gf.nc

In config/config.yaml file, specify the location of the racmo root folder under the argument racmo_dir/.

7. Prepare all other models' ice thickness solutions for comparisons

ICEBOOST code uses the following products of ice thickness distributions for comparisons:

Download all ice thickness tiles from Millan et al. (2022) and place them inside the Millan/thickness/ folder, following the same structure described for the velocity tiles (point 4).

From NSIDC download BedMachineGreenland-v5.nc and place it in Greenland_NSIDC/thickness/. From NSIDC download BedMachineAntarctica-v3.nc and place it in Antarctica_NSIDC/thickness/NSIDC-0756/.

From Farinotti et al. (2019), download the composite_thickness_RGI60-all_regions.zip archive and extract its content in a folder Farinotti/.

Finally, in config/config.yaml, specify the locations of the following folders: millan_icethickness_dir, NSIDC_icethickness_Greenland_dir, NSIDC_icethickness_Antarctica_dir, farinotti_icethickness_dir.

8. Prepare ground truth dataset

From Global Terrestrial Network for Glaciers download the glathida-3.1.0.zip into a folder called glathida/. Unzip the archive.

Create the training dataset 🏋️

The training dataset can be generated by running:

python create_metadata.py

In create_metadata.py you can specify as arguments all input products that you have downloaded. The code generates a raw ~3.0 GB .csv training dataset.

Process training dataset and downscale 🏋️

The model is trained on a post-processed version of the training dataset you just created. To post-process, run:

python process_metadata_and_grid.py

This code removes some points from the raw training dataset, and performs a per-glacier spatial downscaling of all features into a 100x100 grid mesh. The code produces the final ~300.0 MB .csv file for training.

Train model ensemble 🤖

To train the model, run:

python iceboost_train.py

This code trains a xgboost and catboost regression models. The trained models can be saved respectively as .json and .cbm files. These have been deposited on Zenodo.

The code also contains a module to perform inference on a glacier, by specifying its RGI code.

Model inference 🔮

If you don't want to train the model but just run it, you can get the trained modules (.json and .cbm) from Zenodo, specify their names and location in the config/config.yaml, under model_input_dir/, model_filename_xgb, model_filename_cat and run:

python iceboost_deploy.py

The code loads the two trained modules and performs model deploy on either one single glacier, a list of glaciers, or on all glaciers regionally. Under the hood, the code runs fetch_glacier_metadata.py, which generates the feature array on-the-fly.

In config/config.yaml you can also specify the number of points you want to generate: generate n_points_regression.

Acknowledgments

EU CCAI

This work has received funding from the European Union’s Horizon 2020 research and innovation programme, under the Marie Skłodowska-Curie grant agreement No 101066651, project SKYNET. This work was also funded by the Climate Change AI Innovation Grants program, under the project ICENET.

Citation

If you found this code helpful, please consider citing:

@Article{iceboost2024,
    author = {Maffezzoli, N. and Rignot, E. and Barbante, C. and Petersen, T. and Vascon, S.},
    title = {A gradient-boosted tree framework to model the ice thickness of the World's glaciers (IceBoost v1)},
    journal = {EGUsphere},
    volume = {2024},
    year = {2024},
    pages = {1--27},
    url = {https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2455/},
    doi = {10.5194/egusphere-2024-2455}
}