hsahovic / poke-env

A python interface for training Reinforcement Learning bots to battle on pokemon showdown
https://poke-env.readthedocs.io/
MIT License
291 stars 98 forks source link

Error encountered during player.ladder() #306

Closed akashsara closed 2 years ago

akashsara commented 2 years ago

Hi, I was testing a model I trained on Pokemon Showdown (code snippet below) when I ran into this issue. I'm able to challenge the bot to a battle and play against it perfectly well but when I do player.ladder(100) it errors out after completing a single battle.

2022-07-25 18:33:47,574 - UABGLSimpleDQN - ERROR - Unhandled exception raised while handling message:
>battle-gen8randombattle-1625188644
|-message|Nukkumatti lost due to inactivity.
|
|win|UABGLSimpleDQN
Traceback (most recent call last):
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player_network_interface.py", line 131, in _handle_message
    await self._handle_battle_message(split_messages)
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player.py", line 235, in _handle_battle_message
    self._battle_finished_callback(battle)
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\env_player.py", line 106, in _battle_finished_callback
    self._observations[battle].put(self.embed_battle(battle))
KeyError: <poke_env.environment.battle.Gen8Battle object at 0x000001E1988D2EA0>
Task exception was never retrieved
future: <Task finished name='Task-39' coro=<PlayerNetwork._handle_message() done, defined at E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player_network_interface.py:117> exception=KeyError(<poke_env.environment.battle.Gen8Battle object at 0x000001E1988D2EA0>)>
Traceback (most recent call last):
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player_network_interface.py", line 177, in _handle_message
    raise exception
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player_network_interface.py", line 131, in _handle_message
    await self._handle_battle_message(split_messages)
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\player.py", line 235, in _handle_battle_message
    self._battle_finished_callback(battle)
  File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env\player\env_player.py", line 106, in _battle_finished_callback
    self._observations[battle].put(self.embed_battle(battle))
KeyError: <poke_env.environment.battle.Gen8Battle object at 0x000001E1988D2EA0>

Model code:

class SimpleRLPlayerTesting(SimpleRLPlayer):
    def __init__(self, model, *args, **kwargs):
        SimpleRLPlayer.__init__(self, *args, **kwargs)
        self.model = model

    def choose_move(self, battle):
        state = self.embed_battle(battle)
        with torch.no_grad():
            predictions = self.model(state)
        action_mask = self.action_masks()
        action = np.argmax(predictions + action_mask)
        return self._action_to_move(action, battle)

Script:

async def main():
    ...
    player = simple_agent.SimpleRLPlayerTesting(
        model=model,
        player_configuration=PlayerConfiguration(USERNAME, PASSWORD),
        server_configuration=ShowdownServerConfiguration,
        start_timer_on_battle_start=True,
        **player_kwargs
    )
    print("Connecting to Pokemon Showdown...")
    await player.ladder(NUM_GAMES)
    # Print the rating of the player and its opponent after each battle
    for battle in player.battles.values():
        print(battle.rating, battle.opponent_rating)

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())
hsahovic commented 2 years ago

Hey @akashsara,

Thanks for opening this issue. If you want to test a trained model on the ladder, I would recommend not inheriting from the gym player, but from the base player class.

akashsara commented 2 years ago

Do you mean poke_env.player.player.Player?

Edit: Looking at the code for the player above, this would mean that I would need to inherit from the base player class and reimplement an embed_battle(), choose_move() and _action_to_move(). This is not really a big issue or anything and implementing it is trivial but it feels kinda weird to have to do so in another class just to be able to test the model.

MatteoH2O1999 commented 2 years ago

Hey @akashsara, I think you are using the old version of the gym player. Try to install from gh source as the new version should be able to do what you need pretty easily. Let me know if you have any questions!!!

akashsara commented 2 years ago

Thank you, I'll try that!

akashsara commented 2 years ago

So it seems like there's a number of significant changes with the new version. I'm following examples/rl_with_new_open_ai_gym_wrapper.py right now and setting things up, but I was wondering when these changes would be pushed to the next public release? @hsahovic

I'm asking since due to some circumstances I'm running my code elsewhere and that machine has some issues haha. So I'd prefer being able to just install the latest version and run my code vs setting up some hacky bits in the meantime.

akashsara commented 2 years ago

Hey @MatteoH2O1999, So I've updated my code to work with the new version. For completeness I trained a new agent as well. However I'm having issues with testing it on Showdown.

  1. I can't get the player.start_laddering() function to run correctly. This is the code snippet I'm using:
    
    async def main():
    # <model creation code>
    player = simple_agent.SimpleRLPlayerTesting( # Inherits from env_player
            model=model,
            player_configuration=PlayerConfiguration(USERNAME, PASSWORD),
            server_configuration=ShowdownServerConfiguration,
            start_timer_on_battle_start=True,
            start_challenging=False,
    )
    player.start_laddering(NUM_GAMES)
    for battle in player.battles.values():
        print(battle.rating, battle.opponent_rating)

if name == "main": asyncio.get_event_loop().run_until_complete(main())

If I set `start_challenging` to True I get this output:

2022-07-29 16:52:32,957 - UABGLSimpleDQN - WARNING - Popup message received: |popup|The user 'randomplayer1' was not found. Traceback (most recent call last): File "E:\Dev\meta-discovery\play_on_showdown.py", line 121, in asyncio.get_event_loop().run_until_complete(main()) File "C:\Program Files\Python39\lib\asyncio\base_events.py", line 642, in run_until_complete return future.result() File "E:\Dev\meta-discovery\play_on_showdown.py", line 109, in main player.start_laddering(NUM_GAMES) File "E:\Dev\meta-discovery\torch_env\lib\site-packages\poke_env-0.4.21-py3.9.egg\poke_env\player\openai_api.py", line 505, in start_laddering raise RuntimeError("Agent is already challenging") RuntimeError: Agent is already challenging


If I set it to False the script just finishes running almost instantly. I'm monitoring the account on Showdown and it doesn't start any battles or anything.

2. I can't seem to find a way to send/accept challenges. The `player` class has a `accept_challenges` and a `send_challenges` function but there doesn't seem to be an equivalent function in `env_player`.
MatteoH2O1999 commented 2 years ago

Hi @akashsara, to answer you questions I need to explain how the wrapper works: it is designed to be use as an environment and not as a normal player. It uses a custom player run in a background thread so the OpenAIGym API is exposed on the main thread.

Regarding your first question, the problem is you are not actually using the environment. To use it you should have something like

player = ... #derive from env_player
...
player.start_laddering(NUM_GAMES)
for _ in range(NUM_GAMES):
 step = player.reset()
 while not step.done:
  action = model.action(step)
  step = player.step(action)
...

if you wish to implement accept_challenges and send_challenges I would advise to subclass EnvPlayer and define the methods to run in parallel the model-predict and the challenge-loop (use player.agent to access the custom background player)

akashsara commented 2 years ago

Ah I see. Apologies for the misunderstanding. Thank you for clarifying. It's working now!

I should note that there is something a little weird still going on. I get this error everytime I run it, even though battles do seem to be starting and the agent I have seems to be playing: 2022-07-31 18:39:13,537 - RandomPlayer 1 - ERROR - [WinError 1225] The remote computer refused the network connection Note: Neither my agent nor my Showdown account is called RandomPlayer 1 so I'm not sure where this is coming from.

On the challenges part - are there any plans to implement it/something similar on EnvPlayer for the time being? Or is it not on the roadmap? I'll take a crack it either way, but just wanted to know.

MatteoH2O1999 commented 2 years ago

As a temporary workaround use opponent='placeholder string' in your env player. This is because by default a random player gets created if opponent has a "falsy" value (like ''). I think we'll change that in the next patch

MatteoH2O1999 commented 2 years ago

Regarding the challenges part the famous "it's a feature not a bug" applies because accept challenges and send challenges should have the same effect: once you await them, the battle completes. This is impossible to implement by default as it would mean linking poke-env to a specific ML library

hsahovic commented 2 years ago

@akashsara yeah I need to push a new release - i will take care of it this week. What I meant by the base player class is something like this:


from poke_env.player.baselines import class RandomPlayer(Player)

class TrainedModelPlayer(RandomPlayer):
    def choose_move(self, battle):
        state = embed_battle(battle)
        with torch.no_grad():
            predictions = model(state)
        action_mask = SimpleRLPlayer.action_masks()
        action = np.argmax(predictions + action_mask)
        return SimpleRLPlayer._action_to_move(action, battle)

where embed_battle and model are standalone functions / objects - this should then work with ladder and other battling functions.

akashsara commented 2 years ago

@MatteoH2O1999 Thanks! Does this mean there are no plans to have a method to battle against the bot in a custom battle at all? (Apart from the method Haris mentioned above)

@hsahovic Got it, thank you. That should work for me.

MatteoH2O1999 commented 2 years ago

@akashsara, For that there should be the method play_against, but it still requires for you to manage the prediction loop as it only starts the battle in the background. It should also work if you use

player.set_opponent('username to challenge')
player.start_challenging(n_challenges)
for _ in range(n_challenges):
 step = player.reset()
 while not step.done:
  action = model.action(step)
  step = player.step(action)
akashsara commented 2 years ago

Ooh that works out for me. Thanks a lot @MatteoH2O1999

akashsara commented 2 years ago

So as an update for anyone running into this thread later on, for speed reasons I would recommend using the general API like Haris mentioned. The approach Matteo suggested works pretty well for self-play but in terms of pure speed, the general API is much, much, much faster.

From my own rough benchmarking, the general API works out to be 3-4 times as fast.

@hsahovic maybe we could include this somewhere in the documentation? I expected it to be faster but not this fast.