microsoft / dp-transformers

Differentially-private transformers using HuggingFace and Opacus
MIT License
119 stars 22 forks source link

Gradients haven't been cleared since the last optimizer step. In order to obtain privacy guarantees you must call optimizer.zero_grad()on each step #47

Open bahriwissal opened 3 months ago

bahriwissal commented 3 months ago

System Info

Error in line 2330 in the function self.optimizer.step() Gradients haven't been cleared since the last optimizer step. In order to obtain privacy guarantees you must call optimizer.zero_grad()on each step

Explanation Hello, I am trying to train a Tabula (https://github.com/zhao-zilong/Tabula) model using differential privacy. I rewrote my Tabula trainer to use the OpacusDPTrainer, but I encountered an error stating that optimizer.step() requires a call to optimizer.zero_grad() at each step to obtain privacy guarantees. I tried to resolve the problem by reimplementing the _inner_training_loop function from the transformers trainer and forcing self.optimizer.zero_grad() at each step, but I still get the same error.

Code Function start at line 1939, the modification is added to line 2032 ` def _inner_training_loop(

    self, batch_size=None, args=None, resume_from_checkpoint=None, trial=None, ignore_keys_for_eval=None

):

    self.accelerator.free_memory()

    self._train_batch_size = batch_size

    if self.args.auto_find_batch_size:

    ....

    step = -1

        for step, inputs in enumerate(epoch_iterator):

            total_batched_samples += 1

            # My function 

            self.optimizer.zero_grad()

            if self.args.include_num_input_tokens_seen:

                main_input_name = getattr(self.model, "main_input_name", "input_ids")

                if main_input_name not in inputs:

                    logger.warning(

                        "Tried to track the number of tokens seen, however the current model is "

                        "not configured properly to know what item is the input. To fix this, add "

                        "a `main_input_name` attribute to the model class you are using."

                    )

                else:`
huseyinatahaninan commented 3 months ago

Hi @bahriwissal, our OpacusDPTrainer uses DPCallback (https://github.com/microsoft/dp-transformers/blob/f9fae445b1d3bb28355dbaac6720c007abb974ce/src/dp_transformers/dp_utils.py#L188C28-L188C38) which handles DP related operations in substeps and steps. Indeed self.optimizer.zero_grad() should be called for opacus after each step so we do that as well in DPCallback here: https://github.com/microsoft/dp-transformers/blob/f9fae445b1d3bb28355dbaac6720c007abb974ce/src/dp_transformers/dp_utils.py#L78

I hope this helps, let us know if you have any further questions.