huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.05k stars 26.55k forks source link

Gemma-7b model set my own lm_head but cannot saved and changed pretrained embedding_layer's weights too. #31467

Closed ByeongjunCho closed 2 months ago

ByeongjunCho commented 3 months ago

Hello everyone. In gemma-7b looks share embedding layer's weights to lm_head. So I try to train my own lm_head that updated independently.

First, Load model

model = AutoModelForCausalLM.from_pretrained("beomi/gemma-ko-2b", device_map='cpu')

for n, p in model.named_parameters():
    if 'lm_head' in n:
        print(n) # printed nothing

init my own lm_head layer and set in models

lm_head = nn.Linear(model.config.hidden_size, model.config.vocab_size, bias=False)
model.set_output_embeddings(lm_head) # definition is here : https://github.com/huggingface/transformers/blob/9ba9369a2557e53a01378199a9839ec6e82d8bc7/src/transformers/models/gemma/modeling_gemma.py#L1056

for n, p in model.named_parameters():
    if 'lm_head' in n:
        print(n) # "lm_head.weight" was printed

save and load for test, lm_head's weight is saved, but lm_head is not parameters and also embedding layer was changed.

model.save_pretrained("abcd")
# load model
newmodel = AutoModelForCausalLM.from_pretrained("abcd")
for n, p in newmodel.named_parameters():
    if 'lm_head' in n:
        print(n) # printed nothing

Also, weights of embedding layer follow to lm_head

print(model.model.embed_tokens.weight)

Parameter containing: tensor([[ 5.3516e-01, -3.4668e-02, 9.5215e-02, ..., 3.9307e-02, 2.6562e-01, 7.2266e-02], [ 1.5137e-01, -1.6797e-01, -1.4160e-01, ..., -2.3926e-02, -3.3188e-04, -3.5400e-02], [ 1.2256e-01, 2.2705e-02, -3.4424e-02, ..., -3.6774e-03, 3.5095e-03, -1.1292e-02], ...,

print(model.lm_head.weight)

Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,

print(newmodel.model.embed_tokens.weight)

Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,

print(newmodel.lm_head.weight)

Parameter containing: tensor([[-0.0118, 0.0007, 0.0060, ..., 0.0198, 0.0037, -0.0206], [-0.0001, 0.0184, 0.0028, ..., -0.0083, -0.0113, 0.0092], [ 0.0047, -0.0103, -0.0029, ..., 0.0095, -0.0171, 0.0146], ...,

Version : 4.40.1 OS : ubuntu

I wondering that is bug or just my mistake And in this issue, I loaded 'gemma-ko-2b' that korean fine-tuned version of gemma-2b but google/gemma-2b also show same problems.

Thank you.

amyeroberts commented 3 months ago

cc @ArthurZucker

ArthurZucker commented 2 months ago

Oups this might be because of tie_word_embedding did you try disableing it?

ByeongjunCho commented 2 months ago

Hi. Thanks to reply.

I check your reply late. Sorry

I did not consider tie_word_embedding before.

And I check it is working when I set tie_word_embedding=False.

model = AutoModelForCausalLM.from_pretrained("beomi/gemma-ko-2b", device_map='cpu', tie_word_embeddings=False)

for n, p in model.named_parameters():
    if 'lm_head' in n:
        print(n) # printed nothing
lm_head = nn.Linear(model.config.hidden_size, model.config.vocab_size, bias=False)
# model.tie_word_embedding = False
model.set_output_embeddings(lm_head)

for n, p in model.named_parameters():
    if 'lm_head' in n:
        print(n) # "lm_head.weight" was printed

model.save_pretrained("abcd")
# load model
newmodel = AutoModelForCausalLM.from_pretrained("abcd", tie_word_embeddings=False)
for n, p in newmodel.named_parameters():
    if 'lm_head' in n:
        print(n) # printed nothing

example of embed_token's weights

print(newmodel.model.embed_tokens.weight)

Parameter containing: tensor([[ 5.3516e-01, -3.4668e-02, 9.5215e-02, ..., 3.9307e-02, 2.6562e-01, 7.2266e-02], [ 1.5137e-01, -1.6797e-01, -1.4160e-01, ..., -2.3926e-02, -3.3188e-04, -3.5400e-02], [ 1.2256e-01, 2.2705e-02, -3.4424e-02, ..., -3.6774e-03, 3.5095e-03, -1.1292e-02], ..., [ 2.8711e-01, -1.0986e-02, 6.4453e-02, ..., -5.7861e-02, 3.3447e-02, -4.0283e-02], [ 3.4375e-01, -6.6406e-02, 8.5449e-02, ..., -9.1553e-03, 8.0078e-02, 7.7820e-03], [ 5.3906e-01, -3.4424e-02, 9.2285e-02, ..., 3.7842e-02, 2.6562e-01, 7.1777e-02]], requires_grad=True)

example of lm_head's weights

print(newmodel.lm_head.weight)

Parameter containing: tensor([[-0.0036, 0.0196, -0.0146, ..., 0.0157, -0.0012, -0.0003], [-0.0045, -0.0172, 0.0092, ..., -0.0177, 0.0095, 0.0153], [ 0.0201, -0.0134, -0.0142, ..., 0.0105, 0.0198, 0.0213], ..., [ 0.0158, -0.0090, -0.0069, ..., 0.0092, 0.0077, 0.0061], [-0.0003, 0.0172, 0.0207, ..., -0.0197, 0.0111, 0.0053], [-0.0136, -0.0008, -0.0101, ..., 0.0188, -0.0108, 0.0218]], requires_grad=True)

It's solved. Thanks to reply kindly.

I close this issue.

yaolu-zjut commented 1 month ago

Hi, I meet the similar problem, I want to finetune lm_head of Gemma2-2B-Instruct, but if I set requires_grad=False for embed_tokens and requires_grad=True for lm_head, But I find the requires_grad of embed_tokens is also True, I use model = AutoModelForCausalLM.from_pretrained(args.prune_model_path, trust_remote_code=True, device_map=device_map, tie_word_embedding=False ), but it says image so, I wonder what's the version of transformers I need. Here is my code, and my transformers version is 4.44.0:

for name, param in model.named_parameters():
    param.requires_grad = False
    print(f"Layer: {name}, requires_grad: {param.requires_grad}")

for param in model.model.norm.parameters():
    param.requires_grad = True

for param in model.lm_head.parameters():
    param.requires_grad = True

for param in model.lm_head.parameters():
    print(param.requires_grad)

for name, param in model.named_parameters():
    print(f"Layer: {name}, requires_grad: {param.requires_grad}")
ArthurZucker commented 1 month ago

It's tie_word_embeddings not tie_word_embedding

yaolu-zjut commented 1 month ago

It's tie_word_embeddings not tie_word_embedding

It works, sorry for my stupid question.

ArthurZucker commented 1 month ago

Absolutely no worries