mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper
MIT License
286 stars 52 forks source link

seems have a bug in evaluate function #42

Open A11en0 opened 7 months ago

A11en0 commented 7 months ago

It seems to have a bug in evaluate function as shown in following:

image

Since it only caculate the metric of last batch in the evaluation set, it maybe alter to scores = scores.mean().item()

rianrajagede commented 1 month ago

After weeks of experiments, I also just realized this bug. I should have read the Issue page earlier :(