MiniZero is a zero-knowledge learning framework that supports AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero algorithms.
This is the official repository of the IEEE ToG paper MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games.
If you use MiniZero for research, please consider citing our paper as follows:
@article{wu2024minizero,
title={MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games},
author={Wu, Ti-Rong and Guei, Hung and Peng, Pei-Chiun and Huang, Po-Wei and Wei, Ting Han and Shih, Chung-Chin and Tsai, Yun-Jui},
journal={IEEE Transactions on Games},
year={2024},
publisher={IEEE}
}
Outline
MiniZero utilizes zero-knowledge learning algorithms to train game-specific AI models.
It includes a variety of zero-knowledge learning algorithms:
It supports a variety of game environments:
We are planning to add new algorithms, features, and more games in the future.
The MiniZero architecture comprises four components: a server, self-play workers, an optimization worker, and data storage.
The performance of each zero-knowledge learning algorithm on board games and Atari games are shown as follows, where α0, μ0, g-α0, and g-μ0 represent AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero, and $n$ represents simulation count. More details and publicly released AI models are available here.
Results on board games:
Results on Atari games:
MiniZero requires a Linux platform with at least one NVIDIA GPU to operate.
To facilitate the use of MiniZero, a container image is pre-built to include all required packages.
Thus, a container tool such as docker
or podman
is also required.
This section walks you through training AI models using zero-knowledge learning algorithms, evaluating trained AI models, and launching the console to interact with the AI.
First, clone this repository.
git clone git@github.com:rlglab/minizero.git
cd minizero # enter the cloned repository
Then, start the runtime environment using the container.
scripts/start-container.sh # must have either podman or docker installed
Once a container starts successfully, its working folder should be located at /workspace
.
You must execute all of the following commands inside the container.
To train 9x9 Go:
# AlphaZero with 200 simulations
tools/quick-run.sh train go az 300 -n go_9x9_az_n200 -conf_str env_board_size=9:actor_num_simulation=200
# Gumbel AlphaZero with 16 simulations
tools/quick-run.sh train go gaz 300 -n go_9x9_gaz_n16 -conf_str env_board_size=9:actor_num_simulation=16
To train Ms. Pac-Man:
# MuZero with 50 simulations
tools/quick-run.sh train atari mz 300 -n ms_pacman_mz_n50 -conf_str env_atari_name=ms_pacman:actor_num_simulation=50
# Gumbel MuZero with 18 simulations
tools/quick-run.sh train atari gmz 300 -n ms_pacman_gmz_n18 -conf_str env_atari_name=ms_pacman:actor_num_simulation=18
For more training details, please refer to this instructions.
To evaluate the strength growth during training:
# the strength growth for "go_9x9_az_n200"
tools/quick-run.sh self-eval go go_9x9_az_n200 -conf_str env_board_size=9:actor_num_simulation=800:actor_select_action_by_count=true:actor_select_action_by_softmax_count=false:actor_use_dirichlet_noise=false:actor_use_gumbel_noise=false
To compare the strengths between two trained AI models:
# the relative strengths between "go_9x9_az_n200" and "go_9x9_gaz_n16"
tools/quick-run.sh fight-eval go go_9x9_az_n200 go_9x9_gaz_n16 -conf_str env_board_size=9:actor_num_simulation=800:actor_select_action_by_count=true:actor_select_action_by_softmax_count=false:actor_use_dirichlet_noise=false:actor_use_gumbel_noise=false
Note that the evaluations is generated during training in Atari games.
Check ms_pacman_mz_n50/analysis/*_Return.png
for the results.
For more evaluation details, please refer to this instructions.
To interact with a trained model using Go Text Protocol (GTP).
# play with the "go_9x9_az_n200" model
tools/quick-run.sh console go go_9x9_az_n200 -conf_str env_board_size=9:actor_num_simulation=800:actor_select_action_by_count=true:actor_select_action_by_softmax_count=false:actor_use_dirichlet_noise=false:actor_use_gumbel_noise=false
For more console details, please refer to this instructions.
We are actively adding new algorithms, features, and games into MiniZero.
The following work-in-progress features will be available in future versions:
We welcome developers to join the MiniZero community. For more development tips, please refer to this instructions.