histmeisah / Large-Language-Models-play-StarCraftII

TextStarCraft2,a pure language env which support llms play starcraft2
176 stars 9 forks source link

agent keeps losing the games #11

Open detectiveron opened 2 weeks ago

detectiveron commented 2 weeks ago

I was trying to reproduce the experiment and test the various LLMs. I tried both Chatgpt-4 and chatgpt-3.5, and set the game difficulty to 2. I run the game for 3 to 4 times and LLM lost all of them. Specifically, in the early stage of the game, LLM seemed to put too much focus on economy and fully ignored the potential threat from the enemy. As a result, the agent kept building assimilators and Cybernetics cores and failed to build defences and train enough army. So when the enemy came to attack, the defeat was inevitable.

WeChat截图_20240612152412 This screenshot shows that the development of the early to mid stage of the game. There was barely any defence power.

histmeisah commented 2 weeks ago

It is really strange that the agent lose lv2.

Due to the inherent variability in Large Language Models (LLMs) — such as differences across GPT versions and OpenAI's server load — we occasionally encounter inconsistencies in performance. We've also observed that GPT-4 doesn't always outperform GPT-3.5.

These issues may stem from the LLM itself. Therefore, it's crucial to verify the version of the prompt you are using, as we have multiple versions available. Additionally, consider fine-tuning some open-source models with our provided data, as we have achieved excellent performance with fine-tuned models.

Given the updates to GPT models, you might need to refine the prompts or adjust certain parameters, such as "K", to maintain optimal performance.

histmeisah commented 2 weeks ago

And maybe you can test other LLMs like Gemini(that is free).

detectiveron commented 2 weeks ago

I have tried StarCraftIIPrompt_realtime as default and StarCraftIIPrompt_V4 as in your paper. I assumed v4 is optimal, am I wrong? I will try the other versions of prompt as you suggested.

histmeisah commented 2 weeks ago

Thank you, sir. I realize that the issue was due to my mistake in forgetting to restore the settings.

parser.add_argument('--real_time', type=bool, default=True, help='True or False')

Here, the default parameter should be set to False, because GPT cannot react as quickly as the environment demands.

detectiveron commented 2 weeks ago

Ok, now the game speed is really slow. I guess it would take 7 hours to finish one game?

histmeisah commented 2 weeks ago

Ok, now the game speed is really slow. I guess it would take 7 hours to finish one game?

yes, as we have written in paper. It is our limitation and why RL is important in such decision-making missions like StarCraft2.

detectiveron commented 2 weeks ago

Thank you for the help. I'll see when the game finishes.