Using Learning Methods besides keras-rl

Zachary-Fernandes commented 1 year ago

Hello, I have been reading through this repository's documentation, and I understand it is possible to use an OpenAI Gym interface for training reinforcement learning agents. keras-rl seems like a decent package, but I wanted to try using methods and algorithms not found in that package. One I am especially considering is Monte-Carlo Tree Search. Might it be possible to implement this while using this package? Thank you in advance.

Ball-Man commented 1 year ago

Sure, OpenAI gym's interface is designed to be agnostic with respect to the algorithm/package you are willing to use. These environments provide ways to inspect the current state of the game and make actions, but how you feed this info to your algorithm is entirely up to you.

For example, you can look online for pytorch examples solving OpenAI gyms and see how they are implied outside of keras-rl. One example could be solving OpenAI pong with DQN. This is not a montecarlo tree search but as long as you know the algorithm you can adapt the gym to your needs.

This is an example showing how the poke-env OpenAI gym interface works.

Zachary-Fernandes commented 1 year ago

Thank you. I am considering looking at one implementation of minimax and some papers involving a version of MCTS adapted for simultaneous-move games to guide me.

This may diverge from the original topic, but I seem to be having an issue with the tutorial on specifying teams. Where this successfully runs, I encounter issues that seem to pertain to reading the team string as input. For example, this is the team from the tutorial:

    team_2 = """
    Togekiss @ Leftovers
    Ability: Serene Grace
    EVs: 248 HP / 8 SpA / 252 Spe
    Timid Nature
    IVs: 0 Atk
    - Air Slash
    - Nasty Plot
    - Substitute
    - Thunder Wave

    Galvantula @ Focus Sash
    Ability: Compound Eyes
    EVs: 252 SpA / 4 SpD / 252 Spe
    Timid Nature
    IVs: 0 Atk
    - Sticky Web
    - Thunder Wave
    - Thunder
    - Energy Ball

    Cloyster @ King's Rock
    Ability: Skill Link
    EVs: 252 Atk / 4 SpD / 252 Spe
    Adamant Nature
    - Icicle Spear
    - Rock Blast
    - Ice Shard
    - Shell Smash

    Sandaconda @ Focus Sash
    Ability: Sand Spit
    EVs: 252 Atk / 4 SpD / 252 Spe
    Jolly Nature
    - Stealth Rock
    - Glare
    - Earthquake
    - Rock Tomb

    Excadrill @ Focus Sash
    Ability: Sand Rush
    EVs: 252 Atk / 4 SpD / 252 Spe
    Adamant Nature
    - Iron Head
    - Rock Slide
    - Earthquake
    - Rapid Spin

    Cinccino @ King's Rock
    Ability: Skill Link
    EVs: 252 Atk / 4 Def / 252 Spe
    Jolly Nature
    - Bullet Seed
    - Knock Off
    - Rock Blast
    - Tail Slap
    """

When I attempt to run the code, this occurs:

2023-04-03 02:15:45,407 - MaxDamagePlayer 1 - WARNING - Popup message received:
|popup|Your team was rejected for the following reasons:||||- The Pokemon "thunderwave"
does not exist.||- The Pokemon "energyball" does not exist.||- The Pokemon "shellsmash"
does not exist.||- The Pokemon "rocktomb" does not exist.||- The Pokemon "rapidspin"
does not exist.||- The Pokemon "" does not exist.||- You are limited to one of each
Pokémon by Species Clause.||- (You have more than one energyball)

Do you know what the issue might be? I fear it is counting the moves as Pokémon.

Ball-Man commented 1 year ago

Sorry for my late replies.

Actually, by running the whole code from here (which I guess is the one you are also trying to run) I get a couple of different issues. First of all it complains about King's Rock, saying it's a banned item. I'm not familiar with gen8 ou format, but I'm guessing it is a more recent ban, so that this example was left outdated.

By substituting such item with anything else, the teams are loaded correctly. I actually get stopped by another exception:

TypeError: PokemonType.damage_multiplier() missing 1 required keyword-only argument: 'type_chart'

For this one, I'm guessing it is some retrocompability issue with the code. The typechart shall be specified manually through as GenData.from_gen(8).type_chart (to my understanding, this is the best way of retrieving it). After this fix, the code executes just fine with the output: (King's Rock substituted with Leftovers)

Max damage player won 9 / 100 battles

Which is still pretty bad (expected is around 99/100). I'm not sure if something else is going wrong (I don't have time to check right now) or if simply changing the item is leading to such a tremendous shift (I don't see how this could be, with such simple agents).

About your issue, at this point I would say it is related to your specific setup as I don't experience it. What version of poke-env/python/showdown are you running it on? And of course, did you change the code in any way? In that case, please try running the example out of the box for more reproducible results.

Ball-Man commented 1 year ago

I saw your dedicated issue (#367) let's continue tackling the problem there.

thegreatcheese commented 1 year ago

I've have also been getting this error:

TypeError: PokemonType.damage_multiplier() missing 1 required keyword-only argument: 'type_chart'

For this one, I'm guessing it is some retrocompability issue with the code. The typechart shall be specified manually through as GenData.from_gen(8).type_chart (to my understanding, this is the best way of retrieving it). After this fix, the code executes just fine with the output: (King's Rock substituted with Leftovers)

Can you explain how you fixed this issue?

Ball-Man commented 1 year ago

I do not have access to the code right now but I can guide you through it.

Locate the lines in the code containing damage_multiplier. If I remember correctly, there should be two occurrences.
Correct it by adding the required keyword argument, as in my comment above, the line should become like:
```
damage_multiplier(type_chart=GenData.from_gen(8).type_chart)
```

Please notice that GenData has to be imported from poke-env. I do not remember its full namespace but you can look it up by inspecting the poke-env code.

Apologies if this is just an approximate description, if you have further issues I can send my full code correction as soon as I have access to it.

AegeanYan commented 1 year ago

Hi, I also come up with this problem. I begin to play on this project today and with the tutorial, it seems that the check_env function cannot works well. I doubt whether any recent updates lead to this bug.

AegeanYan commented 1 year ago

I'm also trying to implement a torch version because i'm not familiar with keras and TF

AegeanYan commented 1 year ago

I also got this error from check_env: AssertionError: The result returned by env.reset() was not a tuple of the form (obs, info), where obs is a observation and info is a dictionary containing additional information. I found in openai_api.py that the env seems to only return observation. All bugs I mentioned can be reproduce in examples/rl_with_new_open_ai_gym_wrapper.py

Ball-Man commented 1 year ago

I see, I think this might be a compatibility issue with the gym version. Gym API changed a lot, and it is even different in gymnasium. In fact, by taking a look at gym 0.26.0 release notes we can identify when the breaking change happened, which was less than a year ago.

Can you add as requirement gym<0.26.0 and see if this solves the issue?

Any other issue related to the openai envs might be connected to the gym version at this point. Another approach would be looking at the tutorial's last update and aligning that date with the gym version at the time.

hsahovic / poke-env

Using Learning Methods besides keras-rl #364