Issue in evaluation code.

vojtsek / to-llm-bot

34 stars 7 forks source link

Issue in evaluation code. #1

Closed Namo-Bang closed 10 months ago

Namo-Bang commented 10 months ago

Hi! I found some weird part in the evaluation section of your run.py code.

The mweval module evaluates JGA with total state, but in your code line 364, evaluation process progresses just with the turn state. So, JGA is very underestimated.

After modifying this line, the JGA of ChatGPT is over 50.

vojtsek commented 10 months ago

Hi @Namo-Bang, thank you for reporting this! You're right. We made a mistake in accidentally passing the wrong variable for evaluation. I will prepare a fix and update the results.

Best, Vojta