hsahovic / poke-env

A python interface for training Reinforcement Learning bots to battle on pokemon showdown
https://poke-env.readthedocs.io/
MIT License
297 stars 103 forks source link

Rework default player configuration to support parallel runs #208

Open Benjamin-Etheredge opened 3 years ago

Benjamin-Etheredge commented 3 years ago

The current default player configuration generator will cause errors when two runs are kicked off. This is likely due to collisions with the default configuration counter.

hsahovic commented 3 years ago

Hey @Benjamin-Etheredge,

That sounds like a very good catch. Do you have a code snippet I could use to reproduce it?

Benjamin-Etheredge commented 3 years ago

I have a fix for it. I'm trying to get it pushed to a new branch now.

Benjamin-Etheredge commented 3 years ago

I don't have a code snippet to reproduce it. I noticed it when trying to run the same code twice in two shells.

Benjamin-Etheredge commented 3 years ago

There may also be an issue with passing specific player configurations through extended PlayerEnvs. I'm still investigating that one as it could be an issue with my code. It was easier to just rewrite the default configuration generator rather than investigate why my custom player names weren't working. The rework now works for parallel runs though.

hsahovic commented 3 years ago

I don't think that uuids is the way to go - using the counter-based solution creates a clearer sequence of player usernames, which can be useful when debugging or running certain types of algorithms (eg. genetic algorithms with a clear number / generation correlation). If you can provide a snippet to reproduce this issue, I'll gladly issue a fix (probably using a Lock object).

Benjamin-Etheredge commented 3 years ago

I'm not sold on uuid either. It is ugly and hard to track. I had to start logging player names to follow what was going on. I like the more sequential nature, but it leads to collisions when two default configurations are used.

I reproduced it by making a python file with the example code (heads up, there is a typo in the indentions of that example code too)

# -*- coding: utf-8 -*-
import asyncio
import time

from poke_env.player.player import Player
from poke_env.player.random_player import RandomPlayer

class MaxDamagePlayer(Player):
    def choose_move(self, battle):
        # If the player can attack, it will
        if battle.available_moves:
            # Finds the best move among available ones
            best_move = max(battle.available_moves, key=lambda move: move.base_power)
            return self.create_order(best_move)

        # If no attack is available, a random switch will be made
        else:
            return self.choose_random_move(battle)

async def main():
    start = time.time()

    # We create two players.
    random_player = RandomPlayer(
        battle_format="gen8randombattle",
    )
    max_damage_player = MaxDamagePlayer(
        battle_format="gen8randombattle",
    )

    # Now, let's evaluate our player
    await max_damage_player.battle_against(random_player, n_battles=1000)

    print(
        "Max damage player won %d / 100 battles [this took %f seconds]"
        % (
            max_damage_player.n_won_battles, time.time() - start
        )
    )

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())
  1. Make a python file with that.
  2. Open two shells.
  3. Start the code in one shell.
  4. Start the code again in another shell. The second shell will error out with something like 'Someone is already using that name'.
Benjamin-Etheredge commented 3 years ago

Is there a way to store player names as a side effect? Or maybe to catch the "Someone is already using that name" in the default configuration and keep incrementing till the name is free?

Benjamin-Etheredge commented 3 years ago

An additional note for why this needs updating: I'm exploring using openai baselines (and other forks). One key element for algorithms is parallel environments. I'm able to get parallel environments working more easily thanks to patching default configuration generation to not collide on player names.