Closed guanyonglai closed 1 year ago
OK,the order is: python -m torch.distributed.launch --nproc_per_node 2 --tensor-model-parallel-size 1 --pipeline-model-parallel-size 2 --scatter-gather-tensors-in-pipeline --num-layers 24 --hidden-size 1024 --num-attention-heads 16 --seq-length 1024 --max-position-embeddings 1024 --micro-batch-size 4 --global-batch-size 512 --lr 0.00015 --train-iters 500000 --lr-decay-iters 320000 --lr-decay-style cosine --min-lr 0.00001 --lr-warmup-fraction 0.01 --data-path '/data/wikitext-103/wiki.train.tokens' --vocab-file gpt2-vocab.json --merge-file gpt2-merges.txt --split 949,50,1 --log-interval 1 --clip-grad 1.0 --fp16 --DDP-impl local --loss-scale 16384 --apply-query-key-layer-scaling --bias-gelu-fusion --bias-dropout-fusion --exit-interval 320000 --save './checkpoints_tmp' --save-interval 1 --load './checkpoints_tmp' --pipeline-no-flushes --checkpoint-activations --checkpoint-num-layers 1
Hi, I found the 2bw code in scripts\ to be very complex, requiring both Amazon's cloud servers and frequent calls to docker. Is there any 2bw code that will run on the native GPU?