Cannot fully reproduce the Coac vs Chad winrate as reported in CEC2020 using NativeAgent

ronaldosvieira / gym-locm

OpenAI Gym environments for Legends of Code and Magic, a collectible card game designed for AI research

MIT License

34 stars 5 forks source link

The Coac vs Chad winrate was reported to be 57% in CEC 2020, but I obtained a winrate ~= 80% using locm-runner and NativeAgent.

The evaluation code is:

locm-runner \
    --p1-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Coac/main" \
    --p2-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Chad/agent/target/release/agent" \
    --games 100

where I had commented out the cerr << code for Coac (e.g., here and other similar lines) as I found the self._process.read_nonblocking code from NativeAgent seemed to read both stdout and stderr (a known issue).

And here are the printed results:

...
2022-05-30 22:43:51.392527 Episode 97: 79.38% 20.62%
2022-05-30 22:43:57.315334 Episode 98: 78.57% 21.43%
2022-05-30 22:44:03.639829 Episode 99: 78.79% 21.21%
2022-05-30 22:44:12.195598 Episode 100: 79.00% 21.00%
79.00% 21.00%

See also the original discussion here

I tried to run the consistency checks with Coac vs. Chad matches in the original Java engine, but it only works if both agents are deterministic. Sadly, Chad is not deterministic (MCTS has a random component), and I couldn't find an easy way to set a seed to its RNG (I don't know Rust :P).

Running 200 games of Coac vs. Chad using the competition's run.sh script, Coac achieved a win rate of 68%, which is significantly higher than the reported 57%. This may be due to differences in hardware (?) from my computer to those used in the competition since Marasbot from earlier editions also achieved different win rates on my computer. However, while using locm-runner to execute the same matches, Coac ended up with a win rate of 78%, as reported by OP, which means that there may also be something wrong with the NativeAgent class and/or the engine (although the engine seems to be correct, considering the other consistency checks I've run).

For now, I'll let this issue hanging. I'll come back if I think of other ideas to debug this match-up.

ronaldosvieira / gym-locm

Cannot fully reproduce the Coac vs Chad winrate as reported in CEC2020 using NativeAgent #11