stanfordnlp / pyreft

ReFT: Representation Finetuning for Language Models
https://arxiv.org/abs/2404.03592
Apache License 2.0
947 stars 77 forks source link

[P1] Loreft example gsm8k train gives: RuntimeError: output with shape [64, 1, 7] doesn't match the broadcast shape [64, 0, 7] #88

Closed jaymefosa closed 1 month ago

jaymefosa commented 1 month ago

Per the title,

{'train_runtime': 22837.2028, 'train_samples_per_second': 3.769, 'train_steps_per_second': 0.942, 'train_loss': 0.4090520425184212, 'epoch': 12.0}                                                                                                                                                               
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21516/21516 [6:20:37<00:00,  1.06s/it]
{'n_params': 2097408}
  0%|                                                                                                                                                                                                                                                                                    | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "pyreft/examples/loreft/train.py", line 479, in <module>
    main()
  File "pyreft/examples/loreft/train.py", line 475, in main
    finetune(**vars(args), args=args)
  File "pyreft/examples/loreft/train.py", line 399, in finetune
    generations, stats = compute_metrics(
  File "pyreft/examples/loreft/compute_metrics.py", line 179, in compute_metrics
    intervention_locations += left_padding
RuntimeError: output with shape [64, 1, 7] doesn't match the broadcast shape [64, 0, 7]

With launch command:

python train.py -task gsm8k -model /home/jayme/projects/LLM/models/Meta-Llama-3-8B-Instruct-function-calling-json-mode -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 1 -eval_batch_size 1 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06

frankaging commented 1 month ago

hey @jaymefosa thanks for your question!

the common reason for this error is the script does not recognize the BOS token, we need the BOS token to set the intervention location internally.

to verify this, could you print out the raw tokenized sequence for "Hello World" with your tokenizer and check? thanks!

jaymefosa commented 1 month ago

@frankaging thanks for the fast reply, the output was:

Tokenized output: {'input_ids': tensor([[9906, 4435]]), 'attention_mask': tensor([[1, 1]])}

frankaging commented 1 month ago

@jaymefosa thanks! yes -- it seems like you need to add a BOS token (concatenating tokenizer.bos_token_id or adding tokenizer.bos_token to the raw string) upfront and try it again.