Closed snassimr closed 7 months ago
Hi @snassimr,
If you are referring to applying the SFTTrainer directly on ORPO, it wouldn't be feasible as ORPO simultaneously trains on the preference pair data via odds-ratio.
Relating to QLoRA, we have plans to integrate the PEFT methods in the near future but note that QLoRA may not work well with ZeRO Stage 3 or FSDP (related reddit). If you are in need of immediate implementation feel free to try the TRL ORPO implementation.
Hi ,
If it's posible to train ORPO with SFTTrainer and QLora ?
Thanks