[QUESTION]Using ndtimeline-tool to Monitor Megatron-GPT

Using ndtimeline-tool to Monitor Megatron-GPT I want to use the ndtimeline-tool to monitor the computation and communication of each rank in Megatron-GPT. I have two concerns:

1：Before calling init_ndtimeline, initialization is required. Would this conflict with Megatron's own initialize_megatron function? Both involve operations related to process groups, so this could potentially cause communication issues later on.

2：The interfaces of Megatron-LM and vescale are different. How can I integrate the computational interfaces, such as major-metrics, tp-stream-metrics, dp-stream-metrics, pp-batch-stream-metrics, and pp-forward-stream-metrics? Has anyone successfully used ndtimeline-tool with Megatron-GPT before?

thanks！

Using ndtimeline-tool to Monitor Megatron-GPT I want to use the ndtimeline-tool to monitor the computation and communication of each rank in Megatron-GPT. I have two concerns:

1：Before calling init_ndtimeline, initialization is required. Would this conflict with Megatron's own initialize_megatron function? Both involve operations related to process groups, so this could potentially cause communication issues later on.

2：The interfaces of Megatron-LM and vescale are different. How can I integrate the computational interfaces, such as major-metrics, tp-stream-metrics, dp-stream-metrics, pp-batch-stream-metrics, and pp-forward-stream-metrics? Has anyone successfully used ndtimeline-tool with Megatron-GPT before?

thanks！

(1)my progress：I modify nditimeline/init and p2p_communication.py and schedule.py in megatron , but failed to get right timeline. (2)why？？and I wandered why need to register instruction in ndtimeline/pipedream_flush.py? I do not use register instruction , I use @ndtimer(SEND_BACKWARD) def send_backward(input_tensor_grads, tensor_shapes, config) in megatron/core/pipeline_parallel/schedules.py , all interces use the same method.@vocaltract

volcengine / veScale

[QUESTION]Using ndtimeline-tool to Monitor Megatron-GPT #51