huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.05k stars 26.3k forks source link

Input_embeddings grad is None #29674

Closed Newbyl closed 5 months ago

Newbyl commented 6 months ago

System Info

Who can help?

@ArthurZucker @muell

Information

Tasks

Reproduction

Hi, in the provided code snippet I reused the code present in the huggingface "trainer.py" for the "training_step" function, here I want to compute the gradient only for one new token that I added to the vocab (at index 32000) so I just zero out the others but when I want to get the gradients of the input_embeddings they give me None. I'm getting the gradients after the backward function, the "requires_grad" variable is at true and input_embeddings() tensor is a leaf so I don't understand why I'm not able to get these gradients.

I'm using a subset of the LLaVA dataset to do my tests

Code

def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
        model.train()
        inputs = self._prepare_inputs(inputs)
        with self.compute_loss_context_manager():
            loss = self.compute_loss(model, inputs)
        if self.args.n_gpu > 1:
            loss = loss.mean()  # mean() to average on multi-gpu parallel training
        self.accelerator.backward(loss)
        # FIXME: Gradient is None here for input_embed
        for param in model.get_model().get_input_embeddings().parameters():
            mask = torch.arange(param.grad.shape[0]) != 32000
            param.grad[mask, :] = 0
        return loss.detach() / self.args.gradient_accumulation_steps

Stack traces

Traceback (most recent call last):
  File ".../LLaVA/llava/train/train_mem.py", line 4, in <module>
    train(attn_implementation="flash_attention_2")
  File ".../LLaVA/llava/train/train.py", line 1049, in train
    trainer.train()
  File ".../site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File ".../transformers/trainer.py", line 1869, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File ".../LLaVA/llava/train/llava_trainer.py", line 268, in training_step
    mask = torch.arange(param.grad.shape[0]) != 32000
AttributeError: 'NoneType' object has no attribute 'shape'

Expected behavior

Get the gradients and not have them at "None"

ArthurZucker commented 5 months ago

If the gradient is not required for the other embeddings, then they won't have a grad no? can't you just do something like: last = model.get_model().get_input_embeddings().parameters()[32000] to make sure you don't iterate over frozen parameters?

Newbyl commented 5 months ago

The whole LM is frozen except the "input_embeddings" that I unfroze, so normaly the gradients should be computed for all the embeddings, in the code that I provided I zero out other gradients to not update other embeddings except the 32000 one. I use the LLaVA codebase here, and I unfreeze the "input_embeddings" layer with this for p in model.get_model().get_input_embeddings().parameters(): p.requires_grad = True here. just before the line 946. Also I'm using the "pretrain.sh" script with where I just changed "vicuna" language model with the mistral one : --model_name_or_path liuhaotian/llava-v1.6-mistral-7b.

ArthurZucker commented 5 months ago

Mmmm could you isolate the bug to the trainer ? Using an external library does not ensure that it is not coming from there 😞 Could you try with the transformers port of LlavaNext?

ArthurZucker commented 5 months ago

fyi @NielsRogge