hsahovic / poke-env

A python interface for training Reinforcement Learning bots to battle on pokemon showdown
https://poke-env.readthedocs.io/
MIT License
297 stars 103 forks source link

Obtaining Valid doubles moves #524

Closed caymansimpson closed 6 months ago

caymansimpson commented 7 months ago

There is a function in singles to obtain all valid moves for random player, but a similar one doesn't exist for doubles. I think it would be great to:

  1. Add this functionality into Player
  2. Add a function in Battle that validates whether an order will return an error (eg battle.is_valid_order(order))
  3. Add a function in BattleOrder that validates whether an order is valid given a battle (eg order.is_valid(battle))
hsahovic commented 7 months ago

I think this makes sense. For 2., what about automatically validating all double orders?

caymansimpson commented 7 months ago

Sounds like a great idea! How/where would you recommend doing that? Downstream of choose move?

hsahovic commented 7 months ago

Yeah potentially - although I'm not sure what the value is, now that I think about it. If you choose_move implementation is wrong, raising an exception will probably not help. What do you think?

caymansimpson commented 7 months ago

Hm, thinking about it, you're right.

My initial motivation was to use it to help my AI -- eg my networks and/or algorithms are going to choose a distribution of probabilities over actions, and I'd like to 0 out each invalid action, and SoftMax the rest to get the probability to take each valid action.

Given this, I think there's two ways we could do this:

  1. Add this functionality into battle (battle.is_valid_order(order: BattleOrder) -> bool)
  2. Add this functionality into BattleOrder (order.is_valid(battle: Battle) -> bool)
  3. Implement this as a separate utility in my bot outside of poke-env

Do you have a preference? I think #2 makes sense, since you already import Battle and DoubleBattle into BattleOrder?

caymansimpson commented 7 months ago

OK -- Ive finished writing the validation method in DoubleBattleOrder and BattleOrder, and I'm wondering how you would prefer me to write unittests?

I ask because across both, there's ~40 checks to do to ensure a move isn't valid. There's a couple of problems I'm running into:

  1. Recall -- I think the problem with this method is that it will be hard to understand whether I am over-enforcing on validity without an absurd amount of unittests asserting is_valid() == true.
  2. Unittests -- do you want me to add all ~51 different scenarios to test against to ensure that each case is valid?
  3. Should poke-env handle Zoroark well? I don't have special cases for it yet (and RandomPlayer currently breaks with Zoroark), and wondering whether I should add in special cases for is_valid() given there are already Zoroark bugs

Informally, I'm going to just run a bunch of edge case teams (and random doubles) against each other thousands of times to ensure that nothing gets through across formats, but wondering how much I should add to the unit tests

hsahovic commented 6 months ago

I think you can ignore Zoroark.

For the 40 checks, I think they will be required for this to make sense.

hsahovic commented 6 months ago

Have you looked at DoubleBattleOrder.join_orders? I suspect it might do a lot of what you are looking for

caymansimpson commented 6 months ago

Thanks for the guidance! As I was writing and implementing my unit tests, I realized that this functionality is probably not right for poke-env. For the purpose of my AI, I want to check whether a move would fail (e.g. self-target w/ heal pulse when there's no mon there), not whether a move is valid. Because of this, I'll close this Issue? Unless you'd like to still have this functionality in poke-env (code below)



def is_valid(self, battle: DoubleBattle) -> bool:
        """
        Checks if the order is valid.

        :param battle: The battle to check the order against
        :type battle: DoubleBattle
        :return: Whether the order is valid.
        :rtype: bool
        """

        for i, order in enumerate([self.first_order, self.second_order]):

            # The order is only allowed to not exist if we're forced to switch out a pokemon or if one doesn't exist
            if order is None:
                if not (battle.force_switch[1 - i] or battle.active_pokemon[i]): return False
            elif order.order is None:
                if not (battle.force_switch[1 - i] or battle.active_pokemon[i]): return False

            # Check whether a switch is valid
            elif isinstance(order.order, Pokemon):

                if battle.trapped[i] and not battle.force_switch[i]: return False

                # Can't be switching out and tera/dynamax/mega/z-move at the same time
                if(order.mega or order.z_move or order.dynamax or order.terastallize): return False

                # Can only switch to the right pokemon
                if order.order not in battle.available_switches[i]: return False

            # Check whether a move is valid
            elif isinstance(order.order, Move):

                # Check to ensure we can tera/dynamax/mega-evolve/z-move if we choose to do so
                if order.dynamax and not battle.can_dynamax[i]: return False
                elif order.mega and not battle.can_mega_evolve[i]: return False
                elif order.terastallize and not battle.can_tera[i]: return False
                elif order.z_move and not battle.can_z_move[i]: return False

                # For moves that can target any pokemon on the field:
                if order.order.target in (Target.ANY, Target.NORMAL):

                    # These moves need a target
                    if order.move_target == DoubleBattle.EMPTY_TARGET_POSITION: return False

                    # Trying to target an opponent pokemon that doesn't exist
                    if order.move_target in (DoubleBattle.OPPONENT_1_POSITION, DoubleBattle.OPPONENT_2_POSITION) and not battle.opponent_active_pokemon[order.move_target - 1]: return False

                    # Trying to target a Pokemon on your own side of the field
                    if order.move_target in (DoubleBattle.POKEMON_1_POSITION, DoubleBattle.POKEMON_2_POSITION):

                        # Can't target yourself
                        if order.move_target == battle.to_showdown_target(order.order, battle.active_pokemon[i]): return False

                        # Can't target your ally pokemon if they don't exist
                        elif battle.active_pokemon[1 - i] is None: return False

                # Make sure you're're targeting something on your side of the field
                elif order.order.target == Target.ADJACENT_ALLY_OR_SELF:
                    if order.move_target not in (DoubleBattle.POKEMON_1_POSITION, DoubleBattle.POKEMON_2_POSITION): return False

                # Make sure you're targeting something on the opponent's side of the field
                elif order.order.target == Target.ADJACENT_FOE:
                    if order.move_target in (DoubleBattle.EMPTY_TARGET_POSITION, DoubleBattle.POKEMON_1_POSITION, DoubleBattle.POKEMON_2_POSITION): return False

        # Check cases where orders could invalidate each other
        if self.first_order and self.second_order:

            # Check to make sure we're not dynamaxing two pokemon
            if self.first_order.dynamax and self.second_order.dynamax: return False

            # Check to make sure we're not mega-ing two pokemon
            if self.first_order.mega and self.second_order.mega: return False

            # Check to make sure we're not tera-ing two pokemon
            if self.first_order.terastallize and self.second_order.terastallize: return False

            # Check to make sure we're not z-move-ing two pokemon
            if self.first_order.z_move and self.second_order.z_move: return False

            # Check to see we're not switching to the same mon
            if self.first_order.order == self.second_order.order and isinstance(self.first_order.order, Pokemon): return False

        return True```
hsahovic commented 6 months ago

That makes sense. I that case, let's close it :)