kazuto1011 / deeplab-pytorch

PyTorch re-implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC datasets
MIT License
1.09k stars 282 forks source link

iter_loss #84

Closed wuzuowuyou closed 4 years ago

wuzuowuyou commented 4 years ago

First thank you for your meticulous work!

iter_loss = 0
            for logit in logits:
                # Resize labels for {100%, 75%, 50%, Max} logits
                _, _, H, W = logit.shape

                # print("path_img=",path_img)
                labels_ = resize_labels(labels, size=(H, W))
                iter_loss += criterion(logit, labels_.to(device))

            # Propagate backward (just compute gradients)
            iter_loss /= CONFIG.SOLVER.ITER_SIZE
            iter_loss.backward()

why iter_loss /= CONFIG.SOLVER.ITER_SIZE

instead of iter_loss /= logits.size()

kazuto1011 commented 4 years ago

The line is not to average the multiple logits but is to make the accumulated gradients invariant to the number of iteration ITER_SIZE. The block accumulates 1/N-scaled gradients by N times and then update the parameters with N/N-magnitude gradients. It is equivalent to compute the raw gradients once and update parameters immediately. The common trick to save memory.