In your FGT/FGT/networks/network.py module (see the following 'code mark 1' and 'code mark 2' ), I didn't find the .all_reduce() function to aggregate gradients on different GPUs.
loss = m_loss_valid + m_loss_masked + gen_loss
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
Should the code be rewritten like the following form (see the following 'rewritten 1' and 'rewritten 2' ) to aggregate gradients on different GPUs? If not, will each GPU be isolated and calculate the gradient update alone?
To the best of my knowledge, it's not necessary to call the all_reduce function explicitly, because the aggregation of gradients can be implemented by pytorch automatically.
In your FGT/FGT/networks/network.py module (see the following 'code mark 1' and 'code mark 2' ), I didn't find the .all_reduce() function to aggregate gradients on different GPUs.
==============code mark 1===============
==============code mark 2===============
Should the code be rewritten like the following form (see the following 'rewritten 1' and 'rewritten 2' ) to aggregate gradients on different GPUs? If not, will each GPU be isolated and calculate the gradient update alone?
==============rewritten 1===============
==============rewritten 2===============
==============introduction function===============
from torch.distributed as dist