Open 1485840691 opened 1 week ago
Looks like a great change! Thanks @1485840691 for the PR
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Make sure you do make precommit
Make sure you do
make precommit
Done precommit check. Please help review. Thanks
This PR is to address a previous code improvement suggestion that in reward trainer, we could borrow the same idea from DPOTrainer to concatenate chosen and rejected tokens to save one model forward call(). The pitfall of this concatenate forward is increase GPU memory. So add a flag to control on/off of this improvement feature.