thjashin / multires-conv

Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
MIT License
120 stars 2 forks source link

MultiresConv

This repository contains the official PyTorch implementation of

Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)

by Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

Paper: [abstract] [pdf]

TL;DR: We introduce a new SOTA convolutional sequence modeling layer that is simple to implement (15 lines of PyTorch code using standard convolution and linear operators) and requires at most O(N log N) time and memory.

The key component of the layer is a multiresolution convolution operation (MultiresConv, left in the figure) that mimics the computational structure of wavelet-based multiresolution analysis. We use it to build a memory ($\mathbf{z}_n$ in the figure) for long context modeling which captures multiscale trends of the data. Our layer is simple (it's linear) and parameter efficient (it uses depthwise convolutions; filters are shared across timescales), making it easy to intergrate with modern architectures such as gated activations, residual blocks, and normalizations.

Setup

pip install -r requirements.txt

For Long ListOps and PTB-XL experiments, please follow the comments in dataloaders to download and prepare the dataset.

Training

We provide multi-GPU training code for all experiments. For example,

bash scripts/seq_cifar.sh

will run the sequential CIFAR-10 classification experiment with 2 GPUs using the settings in the paper. The main file for classification experiments are classification.py. The autoregressive generative modeling training and evaluation code are in autoregressive.py and autogressive_eval.py.

Citation

If you find this code useful, please cite our work:

@inproceedings{shi2023sequence,
  title={Sequence Modeling with Multiresolution Convolutional Memory},
  author={Shi, Jiaxin and Wang, Ke Alexander and Fox, Emily B.},
  booktitle={International Conference on Machine Learning},
  year={2023}
}