[Question] ORPO + SFTTrainer + QLora

Hi @snassimr,

If you are referring to applying the SFTTrainer directly on ORPO, it wouldn't be feasible as ORPO simultaneously trains on the preference pair data via odds-ratio.

Relating to QLoRA, we have plans to integrate the PEFT methods in the near future but note that QLoRA may not work well with ZeRO Stage 3 or FSDP (related reddit). If you are in need of immediate implementation feel free to try the TRL ORPO implementation.

xfactlab / orpo

[Question] ORPO + SFTTrainer + QLora #10