hitachinsk / FGT

[ECCV 2022] Flow-Guided Transformer for Video Inpainting
https://hitachinsk.github.io/publication/2022-10-01-Flow-Guided-Transformer-for-Video-Inpainting
MIT License
311 stars 32 forks source link

About DDP aggregation gradients on different GPUs #22

Closed hwpengTristin closed 1 year ago

hwpengTristin commented 1 year ago

In your FGT/FGT/networks/network.py module (see the following 'code mark 1' and 'code mark 2' ), I didn't find the .all_reduce() function to aggregate gradients on different GPUs.

==============code mark 1===============
        dis_loss = (dis_real_loss + dis_fake_loss) / 2
        self.dist_optim.zero_grad()
        dis_loss.backward()
        self.dist_optim.step()
==============code mark 2===============
        loss = m_loss_valid + m_loss_masked + gen_loss
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

Should the code be rewritten like the following form (see the following 'rewritten 1' and 'rewritten 2' ) to aggregate gradients on different GPUs? If not, will each GPU be isolated and calculate the gradient update alone?

==============rewritten 1===============
        dis_loss = (dis_real_loss + dis_fake_loss) / 2
        self.dist_optim.zero_grad()
        dis_loss.backward()
        dis_loss=reduce_value(dis_loss, average=True)
        self.dist_optim.step()
==============rewritten 2===============
        loss = m_loss_valid + m_loss_masked + gen_loss
        self.optimizer.zero_grad()
        loss.backward()
        loss=reduce_value(loss, average=True)
        self.optimizer.step()
==============introduction function===============

from torch.distributed as dist image

hitachinsk commented 1 year ago

To the best of my knowledge, it's not necessary to call the all_reduce function explicitly, because the aggregation of gradients can be implemented by pytorch automatically.

hwpengTristin commented 1 year ago

Noted, thank you very much!