I found that the current training script does not support the latest pytorch DDP, so I modified the code and verify that my modification works for the latest pytorch DDP. Also, I check the memory leak in the pre-trained checkpoint load and solved the problem.
I found that the current training script does not support the latest pytorch DDP, so I modified the code and verify that my modification works for the latest pytorch DDP. Also, I check the memory leak in the pre-trained checkpoint load and solved the problem.