Closed Newbyl closed 5 months ago
If the gradient is not required for the other embeddings, then they won't have a grad no? can't you just do something like: last = model.get_model().get_input_embeddings().parameters()[32000]
to make sure you don't iterate over frozen parameters?
The whole LM is frozen except the "input_embeddings" that I unfroze, so normaly the gradients should be computed for all the embeddings, in the code that I provided I zero out other gradients to not update other embeddings except the 32000 one. I use the LLaVA codebase here, and I unfreeze the "input_embeddings" layer with this for p in model.get_model().get_input_embeddings().parameters(): p.requires_grad = True
here. just before the line 946. Also I'm using the "pretrain.sh" script with where I just changed "vicuna" language model with the mistral one : --model_name_or_path liuhaotian/llava-v1.6-mistral-7b
.
Mmmm could you isolate the bug to the trainer ? Using an external library does not ensure that it is not coming from there 😞
Could you try with the transformers
port of LlavaNext
?
fyi @NielsRogge
System Info
transformers
version: 4.37.2Who can help?
@ArthurZucker @muell
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Hi, in the provided code snippet I reused the code present in the huggingface "trainer.py" for the "training_step" function, here I want to compute the gradient only for one new token that I added to the vocab (at index 32000) so I just zero out the others but when I want to get the gradients of the input_embeddings they give me None. I'm getting the gradients after the backward function, the "requires_grad" variable is at true and input_embeddings() tensor is a leaf so I don't understand why I'm not able to get these gradients.
I'm using a subset of the LLaVA dataset to do my tests
Code
Stack traces
Expected behavior
Get the gradients and not have them at "None"