Open ljb121002 opened 1 month ago
https://github.com/openreasoner/openr/blob/main/train/mat/models/ms_prm.py#L32
Additional space makes failure to further recognize the special reward token.
No response
Just run following the official instructions.
Cannot recognize the reward token.
System Info
https://github.com/openreasoner/openr/blob/main/train/mat/models/ms_prm.py#L32
Additional space makes failure to further recognize the special reward token.
Who can help?
No response
Information
Tasks
Reproduction
Just run following the official instructions.
Expected behavior
Cannot recognize the reward token.