Open gongel opened 10 months ago
Can you please share more details ?
NVIDIA-Megatron team proposed "Tensor Parallelism". When training in "Tensor Parallelism", the rank in same group has same data. Paper: https://arxiv.org/pdf/2205.05198.pdf Repo: https://github.com/NVIDIA/Megatron-LM
But in streaming, you only support DDP/FSDP.
Any plan to add this?
one easy solution that seems not to be working could be:
os.environ["WORLD_SIZE"] = str(os.environ["WORLD_SIZE"] // model_parallel_size)
os.environ["RANK"] = str(os.environ["RANK"] // model_parallel_size)
I tried, but seems the code gets stuck after calling something like:
batch = next(batch_iterator)
where batch_iterator a dataloder.
cc: @karan6181
@snarayan21 Looks like this is being addressed. Is that right?
Would like to know if there is any example of megatron integration.
Support tensor parallel/pipeline parallel currently?