Closed ByeongjunCho closed 2 months ago
cc @ArthurZucker
Oups this might be because of tie_word_embedding
did you try disableing it?
Hi. Thanks to reply.
I check your reply late. Sorry
I did not consider tie_word_embedding
before.
And I check it is working when I set tie_word_embedding=False
.
model = AutoModelForCausalLM.from_pretrained("beomi/gemma-ko-2b", device_map='cpu', tie_word_embeddings=False)
for n, p in model.named_parameters():
if 'lm_head' in n:
print(n) # printed nothing
lm_head = nn.Linear(model.config.hidden_size, model.config.vocab_size, bias=False)
# model.tie_word_embedding = False
model.set_output_embeddings(lm_head)
for n, p in model.named_parameters():
if 'lm_head' in n:
print(n) # "lm_head.weight" was printed
model.save_pretrained("abcd")
# load model
newmodel = AutoModelForCausalLM.from_pretrained("abcd", tie_word_embeddings=False)
for n, p in newmodel.named_parameters():
if 'lm_head' in n:
print(n) # printed nothing
example of embed_token
's weights
print(newmodel.model.embed_tokens.weight)
Parameter containing: tensor([[ 5.3516e-01, -3.4668e-02, 9.5215e-02, ..., 3.9307e-02, 2.6562e-01, 7.2266e-02], [ 1.5137e-01, -1.6797e-01, -1.4160e-01, ..., -2.3926e-02, -3.3188e-04, -3.5400e-02], [ 1.2256e-01, 2.2705e-02, -3.4424e-02, ..., -3.6774e-03, 3.5095e-03, -1.1292e-02], ..., [ 2.8711e-01, -1.0986e-02, 6.4453e-02, ..., -5.7861e-02, 3.3447e-02, -4.0283e-02], [ 3.4375e-01, -6.6406e-02, 8.5449e-02, ..., -9.1553e-03, 8.0078e-02, 7.7820e-03], [ 5.3906e-01, -3.4424e-02, 9.2285e-02, ..., 3.7842e-02, 2.6562e-01, 7.1777e-02]], requires_grad=True)
example of lm_head
's weights
print(newmodel.lm_head.weight)
Parameter containing: tensor([[-0.0036, 0.0196, -0.0146, ..., 0.0157, -0.0012, -0.0003], [-0.0045, -0.0172, 0.0092, ..., -0.0177, 0.0095, 0.0153], [ 0.0201, -0.0134, -0.0142, ..., 0.0105, 0.0198, 0.0213], ..., [ 0.0158, -0.0090, -0.0069, ..., 0.0092, 0.0077, 0.0061], [-0.0003, 0.0172, 0.0207, ..., -0.0197, 0.0111, 0.0053], [-0.0136, -0.0008, -0.0101, ..., 0.0188, -0.0108, 0.0218]], requires_grad=True)
It's solved. Thanks to reply kindly.
I close this issue.
Hi, I meet the similar problem, I want to finetune lm_head of Gemma2-2B-Instruct, but if I set requires_grad=False for embed_tokens and requires_grad=True for lm_head, But I find the requires_grad of embed_tokens is also True, I use
model = AutoModelForCausalLM.from_pretrained(args.prune_model_path, trust_remote_code=True, device_map=device_map, tie_word_embedding=False )
, but it says
so, I wonder what's the version of transformers I need. Here is my code, and my transformers version is 4.44.0:
for name, param in model.named_parameters():
param.requires_grad = False
print(f"Layer: {name}, requires_grad: {param.requires_grad}")
for param in model.model.norm.parameters():
param.requires_grad = True
for param in model.lm_head.parameters():
param.requires_grad = True
for param in model.lm_head.parameters():
print(param.requires_grad)
for name, param in model.named_parameters():
print(f"Layer: {name}, requires_grad: {param.requires_grad}")
It's tie_word_embeddings
not tie_word_embedding
It's
tie_word_embeddings
nottie_word_embedding
It works, sorry for my stupid question.
Absolutely no worries
Hello everyone. In gemma-7b looks share embedding layer's weights to lm_head. So I try to train my own lm_head that updated independently.
First, Load model
init my own lm_head layer and set in models
save and load for test, lm_head's weight is saved, but lm_head is not parameters and also embedding layer was changed.
Also, weights of embedding layer follow to lm_head
Parameter containing: tensor([[ 5.3516e-01, -3.4668e-02, 9.5215e-02, ..., 3.9307e-02, 2.6562e-01, 7.2266e-02], [ 1.5137e-01, -1.6797e-01, -1.4160e-01, ..., -2.3926e-02, -3.3188e-04, -3.5400e-02], [ 1.2256e-01, 2.2705e-02, -3.4424e-02, ..., -3.6774e-03, 3.5095e-03, -1.1292e-02], ...,
Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,
Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,
Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,
Version : 4.40.1 OS : ubuntu
I wondering that is bug or just my mistake And in this issue, I loaded 'gemma-ko-2b' that korean fine-tuned version of gemma-2b but google/gemma-2b also show same problems.
Thank you.