The code is for our paper "SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation" and this paper has been accepted by SIGMOD 2023.
Spatial interpolation is to “predict” data for any locations with no historical observations according to sparse station observations. This problem is fundamentally different and more challenging than multivariate time-series imputation, which assumes data at certain locations is partially missing across time.
The intermittency of rainfall (usually zero accumulations) means more complex spatial distribution, while other meteorological variables (e.g., temperature and humidity) usually show smoother distribution.
Two real-world hourly raingauge datasets, HK and BW, are collected and used in this paper. Besides, we take traffic spatial interpolation as another use case and employ one commonly used real-world dataset, PEMS-BAY, to conduct additional experiments.
Download the processed datasets from Google Drive and place them in the data
folder.
Since rainfall is intermittent, performing spatial interpolating for all zeros is meaningless, and too many all-zero data may negatively affect model training. We perform data selection to filter out timestamps with zero/tiny rain to form the final dataset used (HK: 3855 valid timestamps; BW: 3640 valid timestamps). We follow the data selection process below:
In the baselines
folder, you can find the implementation of IDW, OK, TIN, and TPS:
For GNN-based baselines, please refer to their original code: KCN and IGNNK.
attn_tvm
:
lib
: includes generated TVM kernels (\".so\" file).baselines
:
dataset_collator
:
create_data.py
: generate the masked sequences which will be provided to Trainer.py for training and testing.networks
:
postprocess
:
preprocess
:
dist_angle.py
: for HK/BW dataset, generate one matrix that stores the distance and azimuth between all location pairs.generate_traffic_adj_mx.py
: for PEMS-BAY dataset, generate the distance matrix and additional adj_attn_mask (since traffic data is not fully connected, it needs an additional adj_attn_mask for attention operation.). preprocessing.py
: preprocess HK/BW dataset and general the pkl
data for training/testing.preprocess_pems_bay.py
: preprocess PEMS-BAY dataset and general the pkl
data for training/testing.utils
:
python main_train.py --dataset=hk
python main_train.py --dataset=bw
python main_train.py --dataset=bay
@article{li2023ssin,
title={SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation},
author={Li, Jia and Shen, Yanyan and Chen, Lei and Ng, Charles Wang Wai},
journal={Proceedings of the ACM on Management of Data},
volume={1},
number={2},
pages={1--21},
year={2023},
publisher = {Association for Computing Machinery}
}