microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.81k stars 338 forks source link

Fine-tune llama2 with sequence parallelism #360

Open AnirudhVIyer opened 5 months ago

AnirudhVIyer commented 5 months ago

Hi, I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?

NamrataRShivagunde commented 5 months ago

+1

puppet101 commented 5 months ago

+2

stephankoe commented 5 months ago

Do you mean sequence parallelism as proposed in this work (tensor parallelism for non-matmul operations) or sequence parallelism as in DeepSpeed Ulysses (input data sharding along the sequence length dimension)? Did you already tried the --sequence-parallel option and related ones?