Hi!
I found some weird part in the evaluation section of your run.py code.
The mweval module evaluates JGA with total state, but in your code line 364, evaluation process progresses just with the turn state.
So, JGA is very underestimated.
After modifying this line, the JGA of ChatGPT is over 50.
Hi @Namo-Bang, thank you for reporting this! You're right. We made a mistake in accidentally passing the wrong variable for evaluation. I will prepare a fix and update the results.
Hi! I found some weird part in the evaluation section of your run.py code.
The mweval module evaluates JGA with total state, but in your code line 364, evaluation process progresses just with the turn state. So, JGA is very underestimated.
After modifying this line, the JGA of ChatGPT is over 50.