machine-discovery / deer

Parallelizing non-linear sequential models over the sequence length
BSD 3-Clause "New" or "Revised" License
44 stars 2 forks source link

DEER

The official repository of "Parallelizing non-linear sequential models over the sequence length" paper

Installation

pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install --upgrade -e .

If you want to replicate the experimental result, you can install this package by:

pip install --upgrade -e .[replication]

Getting started

The best way to get started is to run the demo script by

python deer/demo.py

This will run a simple speed comparison between DEER and sequential method. The demo script has various options which can be seen by python deer/demo.py --help.

File guide

On the deer/ directory:

A typical output for the demo script using a V100 GPU is as follows (your output may vary):

$ python deer/demo.py 
=========================================
Problem setup
-----------------------------------------
* Random seed: 0
* Cell: GRU
* Input size: 2
* Batch size: 16
* Sequence length: 10000
* Data type: float32 with eps = 1.192e-07
=========================================
You can change the problem setup by passing arguments to this script.
To see the list of arguments, run with --help.

Benchmarking sequential method: 0.22577 seconds
Benchmarking DEER: 0.00331 seconds
DEER GRU speed up over sequential GRU: 68.189x
Maximum absolute deviation: 2.384e-07 where output range: -9.216077e-01 to 7.263898e-01

The files to reproduce the experiments are in the experiments/ directory with reproducibility instructions are mentioned in README.md file in each experiment directory.

Speed comparison of training a GRU model using sequential method (orange) vs DEER method (blue) (2 seconds in this animation corresponds to about an hour in training time): rnn_train