sbl1996 / ygo-agent

Other
49 stars 5 forks source link
agent ai envpool gym-environment rl ygo ygopro yugioh

YGO Agent

YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). It consists of a game environment and a set of AI agents.

Discord

News

Table of Contents

Subprojects

ygoenv

ygoenv is a high performance game environment for Yu-Gi-Oh! It is initially inspired by yugioh-ai and yugioh-game, and now implemented on top of envpool.

ygoai

ygoai is a set of AI agents for playing Yu-Gi-Oh! It aims to achieve superhuman performance like AlphaGo and AlphaZero, with or without human knowledge. Currently, we focus on using reinforcement learning to train the agents.

Building

The following building instructions are only tested on Ubuntu (WSL2) and may not work on other platforms.

To build the project, you need to install the following prerequisites first:

After that, you can build with the following commands:

git clone https://github.com/sbl1996/ygo-agent.git
cd ygo-agent
git checkout stable  # switch to the stable branch
xmake f -y
make

After building, you can run the following command to test the environment. If you see episode logs, it means the environment is working. Try more usage in the next section!

cd scripts
python -u eval.py --env-id "YGOPro-v1" --deck ../assets/deck/  --num_episodes 32 --strategy random  --lang chinese --num_envs 16

Common Issues

Package version not found by xmake

Delete repositories, cache, packages directories in the ~/.xmake directory and run xmake f -y again.

Install packages failed with xmake

Sometimes you may fail to install the required libraries by xmake automatically (e.g., glog and gflags). You can install them manually (e.g., apt install) and put them in the search path ($LD_LIBRARY_PATH or others), then xmake will find them.

GLIBC and GLIBCXX version conflict

Mostly, it is because your libstdc++ from $CONDA_PREFIX is older than the system one, while xmake compiles libraries with the system one and you run programs with the $CONDA_PREFIX one. If so, you can delete the old libstdc++ from $CONDA_PREFIX (backup it first) and make a soft link to the system one.

Other issues

Open a new terminal and try again. If you still encounter issues, you can join the Discord channel for help.

Evaluation

Obtain a trained agent

We provide trained agents in the releases. Check these Flax checkpoint files named with {commit_hash}_{exp_id}_{step}.flax_model and download (the lastest) one to your local machine. The following usage assumes you have it.

If you are not in the stable branch or encounter any other running issues, you can try to switch to the commit_hash commit before using the agent. You may need to rebuild the project after switching:

xmake f -c
xmake b -r ygopro_ygoenv

Play against the agent

We can use eval.py to play against the trained agent with a MUD-like interface in the terminal. We add --xla_device cpu to run the agent on the CPU.

python -u eval.py --deck ../assets/deck --lang chinese --xla_device cpu --checkpoint checkpoints/350c29a_7565_6700M.flax_model --play

We can enter quit to exit the game. Run python eval.py --help for more options, for example, --player 0 to make the agent play as the first player, --deck1 TenyiSword to force the first player to use the TenyiSword deck.

Battle between two agents

We can use battle.py to let two agents play against each other and find out which one is better.

python -u battle.py --deck ../assets/deck --checkpoint1 checkpoints/350c29a_7565_6700M.flax_model --checkpoint2 checkpoints/350c29a_1166_6200M.flax_model --num-episodes 32 --num_envs 8 --seed 0

We can set --record to generate .yrp replay files to the replay directory. The yrp files can be replayed in YGOPro compatible clients (YGOPro, YGOPro2, KoishiPro, MDPro). Change --seed to generate different games.

python -u battle.py --deck ../assets/deck --xla_device cpu --checkpoint1 checkpoints/350c29a_7565_6700M.flax_model --checkpoint2 checkpoints/350c29a_1166_6200M.flax_model --num-episodes 16 --record --seed 0

Training

Training an agent requires a lot of computational resources, typically 8x4090 GPUs and 128-core CPU for a few days. We don't recommend training the agent on your local machine. Reducing the number of decks for training may reduce the computational resources required.

Single GPU Training

We can train the agent with a single GPU using the following command:

cd scripts
python -u cleanba.py --actor-device-ids 0 --learner-device-ids 0 \
--local-num_envs 16 --num-minibatches 8 --learning-rate 1e-4 \
--update-epochs 1 --vloss_clip 1.0 --sep_value --value gae \
--save_interval 100 --seed 0 --m1.film --m1.noam --m1.version 2 \
--local_eval_episodes 32 --eval_interval 50

Deck

deck can be a directory containing .ydk files or a single .ydk file (e.g., deck/ or deck/BlueEyes.ydk). The well tested and supported decks are in the assets/deck directory.

Supported cards are listed in scripts/code_list.txt. New decks which only contain supported cards can be used, but errors may also occur due to the complexity of the game.

Embedding

To handle the diverse and complex card effects, we have converted the card information and effects into text and used large language models (LLM) to generate embeddings from the text. The embeddings are stored in a file (e.g., embed.pkl).

We provide one in the releases, which named embed{n}.pkl where n is the number of cards in code_list.txt.

You can choose to not use the embeddings by skip the --embedding_file option.

Seed

The seed option is used to set the random seed for reproducibility. The training and and evaluation will be exactly the same under the same seed.

Hyperparameters

More hyperparameters can be found in the cleanba.py script. Tuning them may improve the performance but requires more computational resources.

Distributed Training

TODO

Plan

Environment

Training

Inference

Documentation

Sponsors

This work is supported with Cloud TPUs from Google's TPU Research Cloud (TRC).

Related Projects