stanfordnlp / pyreft

ReFT: Representation Finetuning for Language Models
https://arxiv.org/abs/2404.03592
Apache License 2.0
947 stars 77 forks source link

[P1] If set output_original_output to True in intervenable.generate, can we get the model performance without intervention? #111

Closed mrsempress closed 1 week ago

mrsempress commented 2 weeks ago

In file examples/loreft/compute_metrics.py, line 211 ori_response, steered_response = intervenable.generate(**generation_args), if I add generation_args["output_original_output"] = True and take ori_response as the same as steered_response, can I get the model performance without intervention?

I test on the command

python examples/loreft/train.py -task gsm8k \
-model yahma/llama-7b-hf \
-seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 \
-type NodireftIntervention \
-gradient_accumulation_steps 4 \
-batch_size 8 \
-eval_batch_size 4 \
--dropout 0.05 \
--test_split validation \
--use_normalized_template \
--greedy_decoding \
--warmup_ratio 0.00 \
--weight_decay 0.06

and the accuracy is 5.0, while the llama-7b on gsm8k is 11.0. I also tested the accuracy of the llama2-7b, which is 26.3, while the llama2-7b on gsm8k is 14.6; the llama3-8b is 23.7, while I did not find the public report of the llama3-8 B. Is there something wrong, or I misunderstand the meaning of output_original_output

frankaging commented 1 week ago

@mrsempress Sorry about the late reply, but yes, i think that will be the original outputs.

The code to handle this flag is in pyvene as in: https://github.com/stanfordnlp/pyvene/blob/main/pyvene/models/intervenable_base.py#L1623

But feel free also to use model.generate() where the model is not the ReFT model but the original huggingface model.