Open williambarberjr opened 3 months ago
Oh yep I also saw OpenPipe's experiments! Training on completions only is in TRL, but fails to work on multi turn conversations - so a first step is to support this - in theory I have some code to do that, but I'll need to test it more
In axolotl, there's a config parameter you can set:
train_on_inputs: false
It changes the way the loss is calculated when training a lora -> i.e. it ignores the loss on input tokens and only trains the model on the completion token loss. Essentially allowing the model to concentrate entirely on learning to produce the output, at the cost of not learning to produce the input (which is what I want). If I understand correctly, Huggingface trainer doesn't allow combining this training strategy with sample packing. Kyle Corbitt from OpenPipe (a fine tuning startup) shared this image benchmarking the difference it makes when fine tuning for various different tasks. I'd love to see this feature added to Unsloth as I'm convinced it would help me train significantly better models.
Hamel Husain's blog about how to combine custom chat templates with this setting is probably relevant for thinking through how to implement this exactly as it explains how you setup a
jsonl
input file to define what's input and what's output when the chat template varies across models or your desired inference setup after training: https://hamel.dev/notes/llm/finetuning/09_template_free.html