opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.04k stars 107 forks source link

Implementation of Self-Play Training for Real-Time Environments #252

Open Tiikara opened 1 month ago

Tiikara commented 1 month ago

I'm currently conducting experiments with the UniZero neural network using your library. I'm particularly interested in understanding the feasibility of implementing self-play training for UniZero in real-time environments, such as Atari games and similar platforms. Upon examining the codebase, I noticed that self-play is currently implemented for board games. However, I'm finding it challenging to assess the extent of work required to extend this functionality to real-time environments.

Could you provide some insights on the following:

  1. Do you have an existing implementation for self-play in real-time environments?
  2. If yes, could you guide me on where to start to implement this feature? (I believe asking for guidance might be more efficient than trying to figure everything out independently)
  3. If not, could you share your perspective on the complexity of implementing this within your framework?

Your expertise and guidance would be greatly appreciated in helping me navigate this aspect of the library.

Tiikara commented 1 month ago

I've made some progress in understanding the system's functionality. It appears that the default gym environment doesn't natively support multiplayer capabilities. In light of this, I'm currently working on integrating the environment from https://github.com/Farama-Foundation/stable-retro to lightzero, which offers multiplayer support, and then implementing self-learning.

If my implementation proves successful and aligns with your project's interests, I'd be happy to submit a pull request. However, I'm still very interested in any insights or best practices you might have regarding the optimal implementation approach. Your guidance would be invaluable in this process.

puyuan1996 commented 1 month ago

Hello, as you mentioned, we are currently using self-play training primarily in board games and plan to extend UniZero to these games in the near future. Regarding implementing self-play training in real-time environments, it currently requires the environment to be a two-player game, which is not applicable to most Atari games since they are mostly single-player. If you wish to adapt an environment that supports multiplayer games into the LightZero library and conduct self-play training, you can follow these steps:

  1. Create a game environment with an API similar to that of the board game TicTacToe (link). The main methods to implement are reset() and step(), and you may also need to implement three different battle_modes. Additionally, write corresponding test files for your environment.

  2. Write a configuration file similar to TicTacToe's configuration file (link), called your_env_muzero_sp_mode_config.py, and test its performance.

  3. Adapt UniZero to environments that support different battle_modes by referring to the implementation of MuZero and test its performance.

Thank you for your interest. Feel free to reach out with any questions.