neelsjain / NEFTune

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
MIT License
387 stars 20 forks source link

{RecursionError}maximum recursion depth exceeded while calling a Python object #5

Closed NormXU closed 1 year ago

NormXU commented 1 year ago

Environment

transformers==4.34.0

I try the patch in README

from torch.nn import functional as F

def NEFTune(model, noise_alpha=5)
    def noised_embed(orig_embed, noise_alpha):
        def new_func(x):
            # during training, we add noise to the embedding
            # during generation, we don't add noise to the embedding
            if model.training:
                embed_init = orig_embed(x)
                dims = torch.tensor(embed_init.size(1) * embed_init.size(2))
                mag_norm = noise_alpha/torch.sqrt(dims)
                return embed_init + torch.zeros_like(embed_init).uniform_(-mag_norm, mag_norm)
            else:
                return orig_embed(x)
        return new_func
    ##### NOTE: this is for a LLaMA model ##### 
    ##### For a different model, you need to change the attribute path to the embedding #####
    model.base_model.model.model.embed_tokens.forward = noised_embed(model.base_model.model.model.embed_tokens, noise_alpha)
    return model

But model.base_model.model.model.embed_tokens.forward fails to work, showing the following error message:

{AttributeError}'LlamaModel' object has no attribute 'model'

Therefore, I tried to edit the patch as below:

 model.base_model.embed_tokens.forward = noised_embed(model.base_model.embed_tokens,
                                                         noise_alpha)

But this will cause another issue:

{RecursionError}maximum recursion depth exceeded while calling a Python object

Could you give me some advice about it? Thank you very much.

neelsjain commented 1 year ago

I would try print(model) and find the model's path to the embedding layer.

ChrisMii commented 1 year ago

replace

model.base_model.embed_tokens.forward = noised_embed(model.base_model.embed_tokens, noise_alpha)

with

model.base_model.embed_tokens.forward = noised_embed(model.base_model.embed_tokens.forward, noise_alpha)

This works for me. If this failed, you can try partial way instead of lambda function, which works for me too.

hhy150 commented 10 months ago

replace

model.base_model.embed_tokens.forward = noised_embed(model.base_model.embed_tokens, noise_alpha)

with

model.base_model.embed_tokens.forward = noised_embed(model.base_model.embed_tokens.forward, noise_alpha)

This works for me. If this failed, you can try partial way instead of lambda function, which works for me too.

I also encountered the same problem, your method is effective. And I want to ask why it is like this? I have a guess, I don't know if it's correct: it's just that every time it runs to the embed_init = orig_embed(x) ,
The forward function in the orig_embed has been replaced with the new_func , so it will be called by __call_impl, so in fact, this line of code has been continuously looping infinitely.