ylsung / VL_adapter

PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
MIT License
204 stars 16 forks source link

A question about zero-grad settings in VL-adapter's multitask.py file. #19

Open y2sman opened 6 months ago

y2sman commented 6 months ago

Thanks for your brilliant work.

                batch['log_train_accuracy'] = self.args.log_train_accuracy

                # self.optim.zero_grad()
                if self.args.fp16 and _use_native_amp:
                    with autocast():
                        if self.args.distributed:
                            results = self.model.module.train_step(batch)
                        else:
                            results = self.model.train_step(batch)
                else:
                    if self.args.distributed:
                        results = self.model.module.train_step(batch)
                    else:
                        results = self.model.train_step(batch)

                loss = results['loss']

Looking at the code, it appears that you are training without initializing the gradients before performing backpropagation.

Is there a reason why this works?