philschmid / llm-sagemaker-sample

Apache License 2.0
37 stars 17 forks source link

[Question] SFT task #6

Closed austinmw closed 8 months ago

austinmw commented 8 months ago

Hi, probably a dumb question, but in your Mistral fine-tuning notebook example, is next token prediction objective being applied to the entire instruction+context+answer prompt rather than only being applied to the portion that corresponds to the answer?

It seems like the former because the whole prompt is created at once and I don't see any information being given to the model on where the question and context portions are, in order to mask those out of the loss calculation? Wanting to make sure I understand the training objective here. Thanks!

philschmid commented 8 months ago

Yes, It is currently applied to the whole prompt, there is no "masking" of the prompt.

austinmw commented 8 months ago

Okay thanks! One follow-up question if you have time— is it most common nowadays to fine-tune on the whole prompt? A few articles I've read mentioned only applying the loss function to the answer portion.

philschmid commented 8 months ago

is it most common nowadays to fine-tune on the whole prompt?

As far I understand there is no clear opinion yet. A lot of research papers especially on multi-turn conversational prompts are masking the "user" part.

You probably need to give it a try.

austinmw commented 8 months ago

Thanks!