Gradients haven't been cleared since the last optimizer step. In order to obtain privacy guarantees you must call optimizer.zero_grad()on each step

System Info

transformers version == 4.42.4
dp-transformers version == 1.0.1
accelerate version == 0.29.3
llm == Distil-gpt2
torch version == 2.0.0
The model is runned under ddp with 2 GPUs

Error in line 2330 in the function self.optimizer.step() Gradients haven't been cleared since the last optimizer step. In order to obtain privacy guarantees you must call optimizer.zero_grad()on each step

Explanation Hello, I am trying to train a Tabula (https://github.com/zhao-zilong/Tabula) model using differential privacy. I rewrote my Tabula trainer to use the OpacusDPTrainer, but I encountered an error stating that optimizer.step() requires a call to optimizer.zero_grad() at each step to obtain privacy guarantees. I tried to resolve the problem by reimplementing the _inner_training_loop function from the transformers trainer and forcing self.optimizer.zero_grad() at each step, but I still get the same error.

Code Function start at line 1939, the modification is added to line 2032 ` def _inner_training_loop(

    self, batch_size=None, args=None, resume_from_checkpoint=None, trial=None, ignore_keys_for_eval=None

):

    self.accelerator.free_memory()

    self._train_batch_size = batch_size

    if self.args.auto_find_batch_size:

    ....

    step = -1

        for step, inputs in enumerate(epoch_iterator):

            total_batched_samples += 1

            # My function 

            self.optimizer.zero_grad()

            if self.args.include_num_input_tokens_seen:

                main_input_name = getattr(self.model, "main_input_name", "input_ids")

                if main_input_name not in inputs:

                    logger.warning(

                        "Tried to track the number of tokens seen, however the current model is "

                        "not configured properly to know what item is the input. To fix this, add "

                        "a `main_input_name` attribute to the model class you are using."

                    )

                else:`

microsoft / dp-transformers

Gradients haven't been cleared since the last optimizer step. In order to obtain privacy guarantees you must call optimizer.zero_grad()on each step #47