A question about zero-grad settings in VL-adapter's multitask.py file.

Thanks for your brilliant work.

                batch['log_train_accuracy'] = self.args.log_train_accuracy

                # self.optim.zero_grad()
                if self.args.fp16 and _use_native_amp:
                    with autocast():
                        if self.args.distributed:
                            results = self.model.module.train_step(batch)
                        else:
                            results = self.model.train_step(batch)
                else:
                    if self.args.distributed:
                        results = self.model.module.train_step(batch)
                    else:
                        results = self.model.train_step(batch)

                loss = results['loss']

Looking at the code, it appears that you are training without initializing the gradients before performing backpropagation.

Is there a reason why this works?

ylsung / VL_adapter

A question about zero-grad settings in VL-adapter's multitask.py file. #19