histmeisah / Large-Language-Models-play-StarCraftII

TextStarCraft2,a pure language env which support llms play starcraft2
178 stars 9 forks source link

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

VYY 5IX JX3 H)`N$_B}@L StarCraft II is a challenging benchmark for AI agents due to micro-level operations and macro-awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long-term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, presents the immense potential in solving intricate tasks.

Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II. We first develop textual StratCraft II environment, called TextStarCraft II. Secondly, we propose a Chain of Summarization method, including single-frame summarization for processing raw observations and multi-frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment demonstrates that LLM agents are capable of defeating the built-in AI at the Harder(Lv5) difficulty level.

Work AlphaStar SCC HierNet-SC2 AlphaStar Unplugged ROA-Star Ours
Method SL+RL+self-play SL+RL+self-play data-mining + RL offline RL SL+RL+self-play prompt + Rule base script
Compute resource 12000 CPU cores, 384 TPUs Linear 4 GPUs,48 CPU cores not clear 2x 64 v100 1 gpu,1 cpu(home computer)
Required replay 971,000 4,638 608 20,000,000(20m) 120938 0
Best result(The greatest opponent ever to win) Serral(One of the best progamer in the world) Time(IEM2023 Champion) build-in ai lv-10 AlphaStar BC agent hero(GSL Champion) build-in ai lv-5
Strategy Interpretability
Expansibility(adapt to latest game version and other race )

Our paper:

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach https://arxiv.org/abs/2312.11865

Our website:

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Our demo video:

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Performance of LLMs in TextStarCraft II

Comparing models using either the full CoS or CoS without CoT.

Model Method Win Rate PBR RUR APU TR
Using Full CoS
GPT3.5-Turbo-16k Full CoS 5/10 0.0781 7875 0.7608 0.4476
GPT4-Turbo Full CoS 3/6 0.0337 8306 0.7194 0.3452
Gemini-Pro Full CoS 2/10 0.0318 9284 0.6611 0.3571
GLM4 Full CoS 2/10 0.0327 3131 0.6644 0.2904
Llama2 70B Full CoS / / / / /
Claude2.1 Full CoS 2/9 0.0219 10867 0.6599 0.4312
Using CoS without CoT
Finetune-ChatGlm3 6b CoS w/o CoT 2/10 0.0528 30356 0.6547 0.1714
Finetune-Qwen 1.8b CoS w/o CoT 6/10 0.0384 12826 0.7506 0.2095
Finetune-Qwen 7b CoS w/o CoT 6/12 0.0421 12276 0.7234 0.3214
Finetune-Llama2 7b CoS w/o CoT 0/12 0.0469 12295 0.5752 0.0853

Win Rate Comparison of LLM Agents Against TextStarCraft II's Built-in AI

Prompt LV1 LV2 LV3 LV4 LV5 LV6
Prompt1 7/8 6/9 2/8 1/8 0/8 0/8
Prompt2 8/8 9/9 8/8 21/25 7/14 0/12

Install StarCraft II and setup maps

Install StarCraft II

StatCraft II is a classic game developed by BLZ, and has some professional leagues such as IEM, WTL....You can download Battle.net from:https://us.shop.battle.net/en-us, or here:https://www.blizzard.com/zh-tw/

If you are Chinese, due to the Bobby Kotick, CN play cant own their sever again. So we must download StarCraft II by this video :video or you can search in the internet.

Download maps

First , we should use StarCraft II Editor.exe to download the newest ladder map 217539085-d14f0177-33a4-42f1-ac7d-ac9f61ad29f2

when we open this, please log in your blz account and search the map which you want. 217540537-db80aca9-aec7-4d30-b4f9-f4dc818a1697 Then you should put maps to your StarCrafrt2 file in StarCraft II\Maps(If the 'Maps' file dont exist, please create it).

Or you can download maps in here: 20240301144223

Setup environment

Create environment

Tips

Run demo

Game mode

Level 1 2 3 4 5 6 7 8 9 10
BLZ difficulty VeryEasy Easy Medium Hard Harder Very Hard Elite CheatVision CheatMoney CheatInsane
python-sc2 difficulty VeryEasy Easy Medium MediumHard Hard Harder VeryHard CheatVision CheatMoney CheatInsane

Note: Using LLM to play StarCraft2 can take approximately 7 hours for a single game.

Multi process

To save time, you can run multiple demos simultaneously using multiprocess_test.py. Configure the following parameter:

Other parameters are the same as in the Single Process setup.

Other settings

In our experiments, we have added some more settings, but due to several reasons these settings will coming soon.

Create your LLM Agent

If you want to use other llm to create your own llm agent, the following things you should to know.

Component of LLM Agent

Env

The core of our TextStarCraft II env is TextStarCraft2_2/env/bot. Here you can add more settings for environment. So if you want to realise Terran and Zerg bot, you can modify our code about this dictionary.

Support Models

We have tested several LLMs in our experiments. The usage is in sc2_rl_agent/starcraftenv_test/LLM file

Evaluation Metrics Overview

Our framework in TextStarCraft II extends traditional StarCraft II analytics to evaluate LLM agents’ strategies with metrics tailored for AI gameplay performance: