This repo contains the code for our paper
BP-Transformer: Modeling Long-Range Context via Binary Partition
Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang
The code is written in DGL with PyTorch as backend.
develop
branch for dgl 0.5 compatible version).For Multi-GPU training, please export NCCL_LL_THRESHOLD=0
before running scripts because of a PyTorch bug mentioned here.
The codebase has two dependencies: graph_kernel
and graph_builder
, the first one is for efficient graph attention on GPU with node parallel strategy written in CUDA, the second one is for efficient graph construction written in Cython. To install them:
cd graph_builder
python setup.py install
cd ..
cd graph_kernel
python setup.py install
cd ..
We support the following tasks with BPT as backbone:
text_classification.py
lm.py
mt.py
nli.py
All experiment settings mentioned in our paper are available at configs/
.
python *.py --config configs/*.yml --gpu [GPUs]
Note that this repo does not contain any data files, to get dataset required for experiments, run . get_*.sh
and the corresponding dataset would be downloaded and preprocessed.
For machine translation, we have another script mt_infer.py
for decoding:
python mt_infer.py --config configs/*.yml --gpu [GPU]
Before decoding, please make sure you have finished the training using mt.py
with the same config file.
NOTE: Currently we do not support CPU training/inference.
Following is the visualization of the sparse matrix of BPT underlying graph when sequence length is 8192 and k is 4.
python lm.py --config configs/enwik8-8192.yml --gpu 0,1,2,3,4,5,6,7
python mt.py --config configs/iwslt-4-64.yml --gpu 0
python text_classification.py --config configs/imdb-4.yml --gpu 0
For sentence level modeling, we show that BPT models better inductive bias than vanilla transformer by attending fine-grained features of neighbors and coarse-grained features of far-away tokens.
python mt.py --config configs/wmt-*.yml --gpu 0,1,2,3,4,5,6,7
python nli.py --config configs/snli.yml --gpu 0
python text_classification.py --config configs/sst5-2.yml --gpu 0