Closed austinmw closed 8 months ago
Yes, It is currently applied to the whole prompt, there is no "masking" of the prompt.
Okay thanks! One follow-up question if you have time— is it most common nowadays to fine-tune on the whole prompt? A few articles I've read mentioned only applying the loss function to the answer portion.
is it most common nowadays to fine-tune on the whole prompt?
As far I understand there is no clear opinion yet. A lot of research papers especially on multi-turn conversational prompts are masking the "user" part.
You probably need to give it a try.
Thanks!
Hi, probably a dumb question, but in your Mistral fine-tuning notebook example, is next token prediction objective being applied to the entire instruction+context+answer prompt rather than only being applied to the portion that corresponds to the answer?
It seems like the former because the whole prompt is created at once and I don't see any information being given to the model on where the question and context portions are, in order to mask those out of the loss calculation? Wanting to make sure I understand the training objective here. Thanks!