VPipe

If you have any questions about VPipe, please contact sxzhao@cs.hku.hk for quick response.

Overview

Repo architecture

runtime: contains our initial system and initial results

cpm: GPT-2 workaround on a Chinese dataset. Under active development to make vPipe support 3-D parallellism, NCCL backend, and dynamic scaling.

For multi-node, make sure nv_peer_mem driver is installed to achieve optimal communication performance.

Note that you should modify the docker base image version to the Nvidia pytorch docker release 20.01.

This may help you avoid an issue caused by the PyTorch variable version checking.

BERT pre-training uses the following datasets:

To download, verify, extract the datasets, and create the shards in .hdf5 format, see:

Set up your machine and data locations in config files (see example, configs/bert_8vpipe.yml)

cd runtime

VPipe's optimal configuration for 8 GPUs

python driver.py --config_file configs/bert_8vpipe.yml

PipeDream's optimal configuration for 8 GPUs

python driver.py --config_file configs/bert_8pipedream.yml

GPipe's optimal configuration for 8 GPUs

python driver.py --config_file configs/bert_8gpipe.yml

Environment: 2 host with each 4 RTX 2080 ti GPUs

Epoch hour:

vPipe: 1.28 hour GPipe: 1.72 hour Pipedream: 2.14 hour