thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.96k stars 1.13k forks source link

[Illegal moves] Illegal moves made by tictactoe agent #786

Closed wei-ann-Github closed 1 year ago

wei-ann-Github commented 1 year ago

Hi,

I followed the script https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Tianshou/2_training_agents.py to train a tictactoe agent. After training the model, I tried to play against the trained agent, but it seems that the agent is making illegal moves. My code

state_shape = env.observation_space["observation"].shape
action_shape = env.action_space.n

net = Net(state_shape=state_shape,
          action_shape=action_shape,
          hidden_sizes=[128, 128, 128, 128],
             device="cuda" if torch.cuda.is_available() else "cpu",
            ).to("cuda" if torch.cuda.is_available() else "cpu")
optim = torch.optim.Adam(net.parameters(), lr=1e-4)
policy = DQNPolicy(
            model=net,
            optim=optim,
            discount_factor=0.9,
            estimation_step=3,
            target_update_freq=320,
        )

policy.load_state_dict(torch.load(train_path))
agents = env.agents
agent = agents[0]
new_game = True

policy.eval()
while not done:
    action = env.action_space.sample()
    if new_game:
        action = env.action_space.sample()
    else:
        observation['obs'] = observation['obs'].reshape(-1, int(np.prod(state_shape)))  # Reshape observation
        action = policy(Batch(**observation)).act[0]

    observation, reward, done, truncated, info = env.step(action)

    if not done:
        player_action = int(input('User input starts with 1 to 7: ')) - 1
        observation, reward, done, truncated, info = env.step(player_action)
        observation['info'] = info

    new_game = False

The game: image

I've checked the mask. It looks correct.

Anyone able to help?

Trinkle23897 commented 1 year ago

@WillDudley can you take a look? Thanks!

jjshoots commented 1 year ago

Hi @wei-ann-Github, seems like pettingzoo includes an "action_mask" key for the mask, but tianshou expects a "mask" key instead.

The simple solution would probably be to add observation["mask"] = observation["action_mask"] somewhere in the code.

Disclaimer: I haven't tested this code nor have extensive experience with TianShou, but it seems like this is what's going on. If you could post your full file implementation somewhere, that would probably help. For example, observation['obs'] = observation['obs'].reshape(-1, int(np.prod(state_shape))) shouldn't have worked since PZ's tictactoe only issues an "observation" key and not an "obs" key.

PingH129 commented 1 year ago

Hi, have you solve the problem. I face the same concern. When I intentional made an illegal action in Tictactoe, a warning occured (see below). But I didn't see the action mask code either in Tianshou or the environment.

[WARNING]: Illegal move made, game terminating with current player losing. obs['action_mask'] contains a mask of all legal moves that can be chosen.