Closed veritas9872 closed 5 months ago
Hello! Megatron has recently announced "context parallelism", an extension of sequence parallelism that splits activations along sequences even more than the original sequence parallel.
Link 1: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html Link 2: https://oslo.eleuther.ai/CONCEPTS/parallel_context.html
Would it be possible to implement these in nanotron? I believe that they would be very helpful for training large models and/or long sequences.
I think that many users would be interested in this feature as activations can take up large portions of memory during training, however.
Would it be possible to implement these in nanotron?
Yes. We do have plan to support sequence parallelism in very near future... Just currently busy with some other new features
Hello! Megatron has recently announced "context parallelism", an extension of sequence parallelism that splits activations along sequences even more than the original sequence parallel.
Link 1: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html Link 2: https://oslo.eleuther.ai/CONCEPTS/parallel_context.html
Would it be possible to implement these in nanotron? I believe that they would be very helpful for training large models and/or long sequences.
I think that many users would be interested in this feature as activations can take up large portions of memory during training, however.