If the dataset formatting function contains a choices key, which contains a pair of string choices, then finetuning will use the existing LM head rather than training one anew.
The logits at the first token id of each of these choices will be returned from the model for binary classification.
This should make the fine-tuning maximally natural, hopefully aiding weak to strong generalization.
If the dataset formatting function contains a
choices
key, which contains a pair of string choices, then finetuning will use the existing LM head rather than training one anew.The logits at the first token id of each of these choices will be returned from the model for binary classification.
This should make the fine-tuning maximally natural, hopefully aiding weak to strong generalization.