Open ArcherShirou opened 1 day ago
Thanks for reporting, it should have been fixed with #2261. CAN you confirm?
Thank you for your response. After updating the code and testing it, everything is running smoothly now. For the 14B and 72B models, quantization is necessary when using the 0.5B reward model. However, if I switch to the 70B or 72B reward model, I still encounter out-of-memory (OOM) issues midway, even with quantization and LoRA applied. Do you have any good solutions for this?
System Info
Information
Tasks
examples
folderReproduction
I encountered a troubling issue while running the XPO program: the first 500 steps ran smoothly, but suddenly, an error occurred in the middle, as shown below:
Prior to this, the LogCompletionsCallback function was running normally and produced the following records:
I use trl-lib/ultrafeedback-prompt](https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt) prompt only dataset like this:
Could you please advise on how to resolve this bug? Thanks
More Info
my script is:
and I revise the offcial xpo.py as fllow:
Expected behavior
NO