pensquid / yobmef

Weird toy chess engine written *very much from scratch* in Rust.
14 stars 1 forks source link

Automated strength testing #9

Open UlisseMini opened 3 years ago

UlisseMini commented 3 years ago

Ideally, on changes we could trigger an automatic evaluation of the engine which would give us its new elo, similar to fishtest

It seems a smart way would be to use cutechess-cli with an engine tournament. This could be agienst other engines at various skill levels, Older versions of itself, or both. I prefer older versions of itself.

As for where to run it, we could use github actions, my VPS, or dynamically spawn and shutdown vps instances (could exploit cloud free credits)

Thoughts @kognise? I'm thinking about getting this done tomorrow.

UlisseMini commented 3 years ago

https://github.com/lucasart/c-chess-cli Seems good, working on this now

UlisseMini commented 3 years ago

Added helper script around c-chess-cli in scripts/yobtest.sh. for some reason LLR is staying at zero even 420 games in, either I'm doing something wrong, or you need an insane amount of games for c-chess-cli to give you P < 0.05

UlisseMini commented 3 years ago

Blocked by #10 (can't play fast games to compute elo)