Closed jaymefosa closed 1 month ago
hey @jaymefosa thanks for your question!
the common reason for this error is the script does not recognize the BOS token, we need the BOS token to set the intervention location internally.
to verify this, could you print out the raw tokenized sequence for "Hello World" with your tokenizer and check? thanks!
@frankaging thanks for the fast reply, the output was:
Tokenized output: {'input_ids': tensor([[9906, 4435]]), 'attention_mask': tensor([[1, 1]])}
@jaymefosa thanks! yes -- it seems like you need to add a BOS token (concatenating tokenizer.bos_token_id
or adding tokenizer.bos_token
to the raw string) upfront and try it again.
Per the title,
With launch command:
python train.py -task gsm8k -model /home/jayme/projects/LLM/models/Meta-Llama-3-8B-Instruct-function-calling-json-mode -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 1 -eval_batch_size 1 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06