@deepakn94 Do you have any cues on how to fix the hang issue using two DPs with 4GPUs and 3GPUs.
I checked the code, it seems that all data from 4GPUs is only sent to one of the 3 GPUs (I guess it is due to the self.tensor_tags which can only store one tag for one input/output node). e.g, here
I also noticed a sentence called "TODO: don't current support uneven configurations." here
@deepakn94 Do you have any cues on how to fix the hang issue using two DPs with 4GPUs and 3GPUs.
I checked the code, it seems that all data from 4GPUs is only sent to one of the 3 GPUs (I guess it is due to the self.tensor_tags which can only store one tag for one input/output node). e.g, here
I also noticed a sentence called "TODO: don't current support uneven configurations." here