TRL SFTTrainer Examples

Satrat commented 8 months ago

Simplified the SparseML Trainer to be a barebones class definition, everything is handled by SessionManagerMixIn. Removed a bunch of old code from loading recipes, as this is handled by SparseAutoModelForCausalLM
Added new SFTTrainer class which adds our mix-in to trl's SFTTrainer. The only added code here is to add support for passing in a tokenized dataset to SFTTrainer
Added examples of using SFTTrainer for sparse finetuning, both with out dataset preprocessing and TRL's dataset preprocessing

See examples in integrations/huggingface-transformers/tutorials/text-generation/trl_mixin

robertgshaw2-neuralmagic commented 8 months ago

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

Satrat commented 8 months ago

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

FSDP

Distillation

Sure I'll test both of these scenarios, but if it ends up being more than tweaking to get FSDP working I'm going to leave that for another ticket :)

Edit: both worked with some minor tweaks!

neuralmagic / sparseml