microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

Fixed missing BookCorpus dataset in the sequence parallelism example. #407

Closed costin-eseanu closed 4 months ago

costin-eseanu commented 4 months ago

Adapted the idea from https://huggingface.co/blog/megatron-training#data-preprocessing