Should apply model.norm layer to hidden_states[early_exit_layer] ?

voidism / DoLa

Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"

https://arxiv.org/abs/2309.03883

419 stars 50 forks source link

Open githubhyz opened 11 months ago

githubhyz commented 11 months ago

pphuc25 commented 10 months ago

This makes sense, but when apply your suggestion, the accuracy go down in GSM8k dataset, have no idea why