seems have a bug in evaluate function

mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper

MIT License

286 stars 52 forks source link

Open A11en0 opened 7 months ago

A11en0 commented 7 months ago

It seems to have a bug in evaluate function as shown in following:

Since it only caculate the metric of last batch in the evaluation set, it maybe alter to scores = scores.mean().item()

rianrajagede commented 1 month ago

After weeks of experiments, I also just realized this bug. I should have read the Issue page earlier :(