[Illegal moves] Illegal moves made by tictactoe agent

wei-ann-Github commented 1 year ago

[x] I have marked all applicable categories:
- [x] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [ ] new feature request
[x] I have visited the source website
[x] I have searched through the issue tracker for duplicates
[ ] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
```
Versions: 0.4.10 0.26.3 1.13.1+cu117 1.23.5 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03)

Hi,

I followed the script https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Tianshou/2_training_agents.py to train a tictactoe agent. After training the model, I tried to play against the trained agent, but it seems that the agent is making illegal moves. My code

state_shape = env.observation_space["observation"].shape
action_shape = env.action_space.n

net = Net(state_shape=state_shape,
          action_shape=action_shape,
          hidden_sizes=[128, 128, 128, 128],
             device="cuda" if torch.cuda.is_available() else "cpu",
            ).to("cuda" if torch.cuda.is_available() else "cpu")
optim = torch.optim.Adam(net.parameters(), lr=1e-4)
policy = DQNPolicy(
            model=net,
            optim=optim,
            discount_factor=0.9,
            estimation_step=3,
            target_update_freq=320,
        )

policy.load_state_dict(torch.load(train_path))
agents = env.agents
agent = agents[0]
new_game = True

policy.eval()
while not done:
    action = env.action_space.sample()
    if new_game:
        action = env.action_space.sample()
    else:
        observation['obs'] = observation['obs'].reshape(-1, int(np.prod(state_shape)))  # Reshape observation
        action = policy(Batch(**observation)).act[0]

    observation, reward, done, truncated, info = env.step(action)

    if not done:
        player_action = int(input('User input starts with 1 to 7: ')) - 1
        observation, reward, done, truncated, info = env.step(player_action)
        observation['info'] = info

    new_game = False

The game:

I've checked the mask. It looks correct.

Anyone able to help?

Trinkle23897 commented 1 year ago

@WillDudley can you take a look? Thanks!

jjshoots commented 1 year ago

Hi @wei-ann-Github, seems like pettingzoo includes an "action_mask" key for the mask, but tianshou expects a "mask" key instead.

The simple solution would probably be to add observation["mask"] = observation["action_mask"] somewhere in the code.

Disclaimer: I haven't tested this code nor have extensive experience with TianShou, but it seems like this is what's going on. If you could post your full file implementation somewhere, that would probably help. For example, observation['obs'] = observation['obs'].reshape(-1, int(np.prod(state_shape))) shouldn't have worked since PZ's tictactoe only issues an "observation" key and not an "obs" key.

PingH129 commented 1 year ago

Hi, have you solve the problem. I face the same concern. When I intentional made an illegal action in Tictactoe, a warning occured (see below). But I didn't see the action mask code either in Tianshou or the environment.

[WARNING]: Illegal move made, game terminating with current player losing. obs['action_mask'] contains a mask of all legal moves that can be chosen.

thu-ml / tianshou

[Illegal moves] Illegal moves made by tictactoe agent #786