Closed akashsara closed 2 years ago
Hey @akashsara,
Thanks for opening this issue. If you want to test a trained model on the ladder, I would recommend not inheriting from the gym player, but from the base player class.
Do you mean poke_env.player.player.Player
?
Edit: Looking at the code for the player above, this would mean that I would need to inherit from the base player class and reimplement an embed_battle()
, choose_move()
and _action_to_move()
. This is not really a big issue or anything and implementing it is trivial but it feels kinda weird to have to do so in another class just to be able to test the model.
Hey @akashsara, I think you are using the old version of the gym player. Try to install from gh source as the new version should be able to do what you need pretty easily. Let me know if you have any questions!!!
Thank you, I'll try that!
So it seems like there's a number of significant changes with the new version. I'm following examples/rl_with_new_open_ai_gym_wrapper.py
right now and setting things up, but I was wondering when these changes would be pushed to the next public release? @hsahovic
I'm asking since due to some circumstances I'm running my code elsewhere and that machine has some issues haha. So I'd prefer being able to just install the latest version and run my code vs setting up some hacky bits in the meantime.
Hey @MatteoH2O1999, So I've updated my code to work with the new version. For completeness I trained a new agent as well. However I'm having issues with testing it on Showdown.
player.start_laddering()
function to run correctly. This is the code snippet I'm using:
async def main():
# <model creation code>
player = simple_agent.SimpleRLPlayerTesting( # Inherits from env_player
model=model,
player_configuration=PlayerConfiguration(USERNAME, PASSWORD),
server_configuration=ShowdownServerConfiguration,
start_timer_on_battle_start=True,
start_challenging=False,
)
player.start_laddering(NUM_GAMES)
for battle in player.battles.values():
print(battle.rating, battle.opponent_rating)
if name == "main": asyncio.get_event_loop().run_until_complete(main())
If I set `start_challenging` to True I get this output:
2022-07-29 16:52:32,957 - UABGLSimpleDQN - WARNING - Popup message received: |popup|The user 'randomplayer1' was not found.
Traceback (most recent call last):
File "E:\Dev\meta-discovery\play_on_showdown.py", line 121, in
If I set it to False the script just finishes running almost instantly. I'm monitoring the account on Showdown and it doesn't start any battles or anything.
2. I can't seem to find a way to send/accept challenges. The `player` class has a `accept_challenges` and a `send_challenges` function but there doesn't seem to be an equivalent function in `env_player`.
Hi @akashsara, to answer you questions I need to explain how the wrapper works: it is designed to be use as an environment and not as a normal player. It uses a custom player run in a background thread so the OpenAIGym API is exposed on the main thread.
Regarding your first question, the problem is you are not actually using the environment. To use it you should have something like
player = ... #derive from env_player
...
player.start_laddering(NUM_GAMES)
for _ in range(NUM_GAMES):
step = player.reset()
while not step.done:
action = model.action(step)
step = player.step(action)
...
if you wish to implement accept_challenges
and send_challenges
I would advise to subclass EnvPlayer and define the methods to run in parallel the model-predict and the challenge-loop (use player.agent
to access the custom background player)
Ah I see. Apologies for the misunderstanding. Thank you for clarifying. It's working now!
I should note that there is something a little weird still going on. I get this error everytime I run it, even though battles do seem to be starting and the agent I have seems to be playing:
2022-07-31 18:39:13,537 - RandomPlayer 1 - ERROR - [WinError 1225] The remote computer refused the network connection
Note: Neither my agent nor my Showdown account is called RandomPlayer 1
so I'm not sure where this is coming from.
On the challenges part - are there any plans to implement it/something similar on EnvPlayer for the time being? Or is it not on the roadmap? I'll take a crack it either way, but just wanted to know.
As a temporary workaround use opponent='placeholder string'
in your env player. This is because by default a random player gets created if opponent
has a "falsy" value (like ''
). I think we'll change that in the next patch
Regarding the challenges part the famous "it's a feature not a bug" applies because accept challenges and send challenges should have the same effect: once you await them, the battle completes. This is impossible to implement by default as it would mean linking poke-env to a specific ML library
@akashsara yeah I need to push a new release - i will take care of it this week. What I meant by the base player class is something like this:
from poke_env.player.baselines import class RandomPlayer(Player)
class TrainedModelPlayer(RandomPlayer):
def choose_move(self, battle):
state = embed_battle(battle)
with torch.no_grad():
predictions = model(state)
action_mask = SimpleRLPlayer.action_masks()
action = np.argmax(predictions + action_mask)
return SimpleRLPlayer._action_to_move(action, battle)
where embed_battle
and model
are standalone functions / objects - this should then work with ladder
and other battling functions.
@MatteoH2O1999 Thanks! Does this mean there are no plans to have a method to battle against the bot in a custom battle at all? (Apart from the method Haris mentioned above)
@hsahovic Got it, thank you. That should work for me.
@akashsara,
For that there should be the method play_against
, but it still requires for you to manage the prediction loop as it only starts the battle in the background. It should also work if you use
player.set_opponent('username to challenge')
player.start_challenging(n_challenges)
for _ in range(n_challenges):
step = player.reset()
while not step.done:
action = model.action(step)
step = player.step(action)
Ooh that works out for me. Thanks a lot @MatteoH2O1999
So as an update for anyone running into this thread later on, for speed reasons I would recommend using the general API like Haris mentioned. The approach Matteo suggested works pretty well for self-play but in terms of pure speed, the general API is much, much, much faster.
From my own rough benchmarking, the general API works out to be 3-4 times as fast.
@hsahovic maybe we could include this somewhere in the documentation? I expected it to be faster but not this fast.
Hi, I was testing a model I trained on Pokemon Showdown (code snippet below) when I ran into this issue. I'm able to challenge the bot to a battle and play against it perfectly well but when I do
player.ladder(100)
it errors out after completing a single battle.Model code:
Script: