train speed is too slow

jkl375 commented 2 months ago

I found that when the context length is 512k, the training speed is too slow, which is different from your experimental results. It takes 585 seconds for training a batch of 512k , which is to 512000/585.85=873.94 tokens/s And I used A100-80G*8 with NVLINK.

accelerate launch \
--config_file accelerate_configs/single_node.yaml \
train.py \
--batch-size 1 \
--gradient-accumulate-every 2  \
--output-dir  ./output/7B_0.5M_bs_1M_rope_250M_step_90_lr_2e-5 \
--seed 2027 \
--max-train-steps 90  \
--learning-rate 1e-5  \
--dataset PY007/slimpajama_llama_tokenized_upsample_4096_chunk_1M \
--model meta-llama/Llama-2-7b-hf  \
--seq-length 512000 \
--rope-theta 250000000 \
--parallel_mode zigzag_ring_attn

jzhang38 commented 2 months ago

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85

jkl375 commented 2 months ago

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85 I see, thanks.

jzhang38 / EasyContext

train speed is too slow #9