jlidw / SSIN

Code for the SIGMOD 2023 paper "SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation".
9 stars 3 forks source link
data-driven rainfall self-supervised-learning spatial-interpolation

SSIN

The code is for our paper "SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation" and this paper has been accepted by SIGMOD 2023.

About Rainfall Spatial Interpolation

Spatial Interpolation vs. Time Series Imputation:

Spatial interpolation is to “predict” data for any locations with no historical observations according to sparse station observations. This problem is fundamentally different and more challenging than multivariate time-series imputation, which assumes data at certain locations is partially missing across time.

Rainfall vs. Other Meteorological Variables:

The intermittency of rainfall (usually zero accumulations) means more complex spatial distribution, while other meteorological variables (e.g., temperature and humidity) usually show smoother distribution.

Datasets

Two real-world hourly raingauge datasets, HK and BW, are collected and used in this paper. Besides, we take traffic spatial interpolation as another use case and employ one commonly used real-world dataset, PEMS-BAY, to conduct additional experiments.

Processed Data

Download the processed datasets from Google Drive and place them in the data folder.

How to select rainy timestamps?

Since rainfall is intermittent, performing spatial interpolating for all zeros is meaningless, and too many all-zero data may negatively affect model training. We perform data selection to filter out timestamps with zero/tiny rain to form the final dataset used (HK: 3855 valid timestamps; BW: 3640 valid timestamps). We follow the data selection process below:

Raw Data

Baselines

In the baselines folder, you can find the implementation of IDW, OK, TIN, and TPS:

For GNN-based baselines, please refer to their original code: KCN and IGNNK.

Instructions

attn_tvm:

baselines:

dataset_collator:

networks:

postprocess:

preprocess:

utils:

Run

python main_train.py --dataset=hk
python main_train.py --dataset=bw
python main_train.py --dataset=bay

Citation

@article{li2023ssin,
  title={SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation},
  author={Li, Jia and Shen, Yanyan and Chen, Lei and Ng, Charles Wang Wai},
  journal={Proceedings of the ACM on Management of Data},
  volume={1},
  number={2},
  pages={1--21},
  year={2023},
  publisher = {Association for Computing Machinery}
}