kevaday / alphazero-general

A fast, generalized, and modified implementation of Deepmind's distinguished AlphaZero in PyTorch.
MIT License
61 stars 19 forks source link

Issue: raise ValueError(f'Invalid action encountered while updating root: {c.a}') #34

Open Massa7ca opened 2 months ago

Massa7ca commented 2 months ago

In your implementation of the Othello game, there is no logic to handle passing a turn when a player has no valid moves. I added this logic to the win_state method as follows:

`def win_state(self) -> np.ndarray: result = [False] * (NUM_PLAYERS + 1) player = self._player_range()

has_legal_moves_player = self._board.has_legal_moves(player)
has_legal_moves_reverse_player = self._board.has_legal_moves(-player)

if not has_legal_moves_player and has_legal_moves_reverse_player:
    self._update_turn()
    return np.array(result, dtype=np.uint8)

if not has_legal_moves_player and not has_legal_moves_reverse_player:
    diff = self._board.count_diff(player)
    if diff > 0:
        result[self.player] = True
    elif diff < 0:
        result[self._next_player(self.player)] = True
    else:
        result[NUM_PLAYERS] = True

return np.array(result, dtype=np.uint8)`

After adding this logic, I started encountering a division by zero error in the probs method in MCTS.pyx on the line:

probs = (counts / np.sum(counts)) ** (1.0 / temp)

where np.sum(counts) becomes zero.

I tried to fix this by modifying the code as follows:

`total_count = np.sum(counts) if total_count == 0: return np.full_like(counts, 1.0 / len(counts))

try: probs = (counts / total_count) ** (1.0 / temp) probs /= np.sum(probs) return probs`

However, after this change, I encounter another error in the update_root method in MCTS.pyx:

raise ValueError(f'Invalid action encountered while updating root: {c.a}')

Massa7ca commented 2 months ago

I think I fixed the problem