voidism / DoLa

Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
https://arxiv.org/abs/2309.03883
419 stars 50 forks source link

Should apply model.norm layer to hidden_states[early_exit_layer] ? #9

Open githubhyz opened 11 months ago

githubhyz commented 11 months ago

https://github.com/voidism/DoLa/blob/dc88907406f9744f748f3c779f2353efd5bdc824/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py#L703

I think you guys should apply model.norm layer to hidden_states[early_exit_layer] . Because only the last hidden_state applied model.norm layer. See https://github.com/voidism/DoLa/blob/dc88907406f9744f748f3c779f2353efd5bdc824/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py#L594

pphuc25 commented 10 months ago

This makes sense, but when apply your suggestion, the accuracy go down in GSM8k dataset, have no idea why