Open mst272 opened 4 months ago
And I tried the same thing on the DPOV2Trainer and got the same error. But the script runs when i do not use unsloth.
Hmmm its probably because these trainers need generation steps - hmm I'll have to see
And I tried the same thing on the DPOV2Trainer and got the same error. But the script runs when i do not use unsloth.
Hey, Just wondering if RLOOTrainer if this issue fixed or not? Here also want to use RLOOTrainer on unsloth model..
Sorry I never got to it :(
+1 I am trying to implement GRPO for the TRL library. The model fails in generation phase with the following error:
RuntimeError: Unsloth: You must call `FastLanguageModel.for_inference(model)` before doing inference for Unsloth models.
Is there a way to run this without the for_inference
@saisurbehera No sorry currently not - you can add a hook to enable it, then disable it
Will that have latency concerns ?
When I used RLOOTrainer in the trl library for rlhf, I loaded the policy model and ref_policy model through unsloth, but it reported the above error, so I would like to ask if it is not supported?