stanfordnlp / pyreft

ReFT: Representation Finetuning for Language Models
https://arxiv.org/abs/2404.03592
Apache License 2.0
947 stars 77 forks source link

[P1] For left_padding in compute_metrics.py #110

Open mrsempress opened 2 weeks ago

mrsempress commented 2 weeks ago

When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match. This is because the tokenizer. bos_tokenid=0 of llama-7b has multiple positions of 0 in input. If we use the following formula in the project: `left adding=(inputs ["input_ids"]=tokenizer. bos_token_id). nonzero (astuple=True) [1]`, then the size of left adding is (N), where N is the number of inputs ["input_ids"] that are 0, rather than the desired size: (batch_size). Therefore, I have changed it to the following code:

Mask=(inputs ["input_ids"]==tokenizer. bos_token_id)
Indications=torch. top (mask. int()), k=1, dim=-1).indices
Left_pdding=torch. where (mask. any (dim=-1), indices. reshape (mask. shape [: -1]), -1)

I hope the author can verify whether it is due to other issues that caused my error or if I understand the reason; Is the revised code correct.

mrsempress commented 2 weeks ago

The command I use is

CUDA_VISIBLE_DEVICES=6 python examples/loreft/train.py -task gsm8k -model models/Llama/Llama/llama-7b-hf/ -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 8 -eval_batch_size 4 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06 --save_model

frankaging commented 1 week ago

@mrsempress Thanks for your question. Could you elaborate this?

When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match.

intervention_locations is determined by -p f7+l7 (first 7 and last 7 prompt tokens), which does not need to match the size of left_padding IIUC.