Open mrsempress opened 2 weeks ago
The command I use is
CUDA_VISIBLE_DEVICES=6 python examples/loreft/train.py -task gsm8k -model models/Llama/Llama/llama-7b-hf/ -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 8 -eval_batch_size 4 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06 --save_model
@mrsempress Thanks for your question. Could you elaborate this?
When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match.
intervention_locations
is determined by -p f7+l7
(first 7 and last 7 prompt tokens), which does not need to match the size of left_padding
IIUC.
When I training using llama-7b and math, I found that the sizes of left_pdding and intervention_locations did not match. This is because the tokenizer. bos_tokenid=0 of llama-7b has multiple positions of 0 in input. If we use the following formula in the project: `left adding=(inputs ["input_ids"]=tokenizer. bos_token_id). nonzero (astuple=True) [1]`, then the size of left adding is (N), where N is the number of inputs ["input_ids"] that are 0, rather than the desired size: (batch_size). Therefore, I have changed it to the following code:
I hope the author can verify whether it is due to other issues that caused my error or if I understand the reason; Is the revised code correct.