huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Question concerning context parallelism. #126

Closed veritas9872 closed 5 months ago

veritas9872 commented 5 months ago

Hello! Megatron has recently announced "context parallelism", an extension of sequence parallelism that splits activations along sequences even more than the original sequence parallel.

Link 1: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html Link 2: https://oslo.eleuther.ai/CONCEPTS/parallel_context.html

Would it be possible to implement these in nanotron? I believe that they would be very helpful for training large models and/or long sequences.

I think that many users would be interested in this feature as activations can take up large portions of memory during training, however.

xrsrke commented 5 months ago

Would it be possible to implement these in nanotron?

Yes. We do have plan to support sequence parallelism in very near future... Just currently busy with some other new features