Fixed missing BookCorpus dataset in the sequence parallelism example.

microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.9k stars 345 forks source link

Closed costin-eseanu closed 4 months ago

costin-eseanu commented 4 months ago