Closed Satrat closed 7 months ago
Thanks Sara - this looks really nice
Are there any other features we should flex? I am thinking we might want to look at:
Thanks Sara - this looks really nice
Are there any other features we should flex? I am thinking we might want to look at:
- FSDP
- Distillation
Sure I'll test both of these scenarios, but if it ends up being more than tweaking to get FSDP working I'm going to leave that for another ticket :)
Edit: both worked with some minor tweaks!
Trainer
to be a barebones class definition, everything is handled bySessionManagerMixIn
. Removed a bunch of old code from loading recipes, as this is handled bySparseAutoModelForCausalLM
SFTTrainer
class which adds our mix-in to trl'sSFTTrainer
. The only added code here is to add support for passing in a tokenized dataset toSFTTrainer
SFTTrainer
for sparse finetuning, both with out dataset preprocessing and TRL's dataset preprocessingAsana ticket: https://app.asana.com/0/1201735099598270/1206486351032763/f
Testing
See examples in
integrations/huggingface-transformers/tutorials/text-generation/trl_mixin