[rllib] Training via self-play with AlphaZero

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.09k stars 5.6k forks source link

[rllib] Training via self-play with AlphaZero #12646

Closed DoxakisCh closed 1 year ago

DoxakisCh commented 3 years ago

Hello,

I want to use the AlphaZero agent of rllib on a poker environment that will learn to play via self-play. I understand that the current agent is designed only for single player games. Is there any way to extend it somehow in order to learn via self-play on Two-player adversarial games like chess and heads up poker?

hybug commented 3 years ago

Issue #6669 implement self-play with PPO via multi-agent. But in PokerGame, the opponent-agent compute_action must base on the observation after rl-agent's step. So i don't think multi-agent is the proper way to implement PokerGame's self-play.

If you have any progress on PokerGame's selfplay, welcome to communicate with me.