hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.59k stars 3.99k forks source link

是否可以增加联合训练的功能? #4921

Closed zhengjie-zhou closed 2 months ago

zhengjie-zhou commented 2 months ago

Reminder

System Info

no

Reproduction

no

Expected behavior

目前工程中集成了DPO、PPO、KTO、SFT等训练方式,是否可以新增对他们的组合功能,比如$L= \alpha L_{SFT} + \beta L_{DPO}$ ,其中$\alpha$和$\beta$属于超参数。

Others

No response

hiyouga commented 2 months ago

请见 pref_ftx 参数

zhengjie-zhou commented 2 months ago

请见 pref_ftx 参数

pref_ftx: float = field(
    default=0.0,
    metadata={"help": "The supervised fine-tuning loss coefficient in DPO training."},
)
那如果我想联合DPO和KTO进行训练,该如何调整? @hiyouga