sotetsuk / pgx

♟️ Vectorized RL game environments in JAX
http://sotets.uk/pgx/
Apache License 2.0
372 stars 23 forks source link

bug about go game #1164

Closed Nightbringers closed 6 months ago

Nightbringers commented 6 months ago

[300, 60, 288, 72, 97, 67, 63, 81, 40, 59, 41, 61, 58, 42, 77, 102, 84, 83, 64, 65, 111, 46, 54, 73, 53, 71, 74, 263, 282, 262, 298, 52, 93, 315, 296, 295, 277, 316, 317, 311, 205, 207, 206, 260, 279, 188, 226, 245, 225, 167, 227, 283, 302, 240, 222, 280, 284, 264, 276, 320, 301, 299, 319, 318, 337, 336, 338, 294, 147, 335, 238, 221, 200, 183, 166, 163, 127, 255, 217, 249, 160, 125, 310, 291, 309, 330, 268, 250, 290, 272, 215, 197, 216, 271, 248, 229, 267, 180, 211, 219, 218, 256, 257, 220, 237, 179, 196, 109, 148, 230, 159, 326, 327, 307, 289, 185, 186, 157, 181, 161, 162, 142, 199, 203, 143, 126, 210, 209, 124, 122, 212, 155, 172, 153, 134, 135, 152, 173, 154, 305, 324, 153, 190, 228, 154, 168, 149, 153, 275, 274, 154, 23, 116, 153, 204, 184, 154, 136, 108, 88, 89, 69, 232, 269, 306, 325, 287, 323, 343, 344, 242, 346, 347, 261, 285, 329, 328, 178, 33, 32, 34, 13, 86, 87, 85, 342, 66, 47, 48, 68, 29, 45, 324, 247, 304, 343, 175, 270, 286, 348, 156, 137, 138, 119, 158, 139, 176, 266, 345, 138, 177, 21, 39, 22, 99, 100, 174, 153, 123, 141, 154, 346, 324, 153, 345, 305]. this is full action, when move the last move, game terminated strangely

sotetsuk commented 6 months ago

Thank you sharing! I'll investigate tomorrow.

sotetsuk commented 6 months ago

It looks like the last action is illegal as it's PSK. Last state states[-1] and states[-7] are the same. I suppose it should have been omitted from legal action but it's legal now. I'll investigate it further.

states[-1]

last_state_m1

states[-2]

last_state_m2

states[-3]

last_state_m3

states[-4]

last_state_m4

states[-5]

last_state_m5

states[-6]

last_state_m6

states[-7]

last_state_m7

states[-8]

last_state_m8

code:

import jax
import jax.numpy as jnp
import pgx

actions = [300, 60, 288, 72, 97, 67, 63, 81, 40, 59, 41, 61, 58, 42, 77, 102, 84, 83, 64, 65, 111, 46, 54, 73, 53, 71, 74, 263, 282, 262, 298, 52, 93, 315, 296, 295, 277, 316, 317, 311, 205, 207, 206, 260, 279, 188, 226, 245, 225, 167, 227, 283, 302, 240, 222, 280, 284, 264, 276, 320, 301, 299, 319, 318, 337, 336, 338, 294, 147, 335, 238, 221, 200, 183, 166, 163, 127, 255, 217, 249, 160, 125, 310, 291, 309, 330, 268, 250, 290, 272, 215, 197, 216, 271, 248, 229, 267, 180, 211, 219, 218, 256, 257, 220, 237, 179, 196, 109, 148, 230, 159, 326, 327, 307, 289, 185, 186, 157, 181, 161, 162, 142, 199, 203, 143, 126, 210, 209, 124, 122, 212, 155, 172, 153, 134, 135, 152, 173, 154, 305, 324, 153, 190, 228, 154, 168, 149, 153, 275, 274, 154, 23, 116, 153, 204, 184, 154, 136, 108, 88, 89, 69, 232, 269, 306, 325, 287, 323, 343, 344, 242, 346, 347, 261, 285, 329, 328, 178, 33, 32, 34, 13, 86, 87, 85, 342, 66, 47, 48, 68, 29, 45, 324, 247, 304, 343, 175, 270, 286, 348, 156, 137, 138, 119, 158, 139, 176, 266, 345, 138, 177, 21, 39, 22, 99, 100, 174, 153, 123, 141, 154, 346, 324, 153, 345, 305]

env = pgx.make("go_19x19")
key = jax.random.PRNGKey(0)
init_fn = jax.jit(env.init)
step_fn = jax.jit(env.step)

states = []

state = init_fn(key)
states.append(state)
for a in actions:
    print(a, f"legal? {state.legal_action_mask[a]}")
    state = step_fn(state, a)
    print(a, f"terminated? {state.terminated}")
    print(a, f"psk? {state._x.is_psk}")
    states.append(state)

for i in range(1, 9):
    pgx.save_svg(states[-i], f"last_state_m{i}.svg")
pgx.save_svg_animation(states, "states.svg")

outputs:

...
153 psk? False
345 legal? True
345 terminated? False
345 psk? False
305 legal? True
305 terminated? True
305 psk? True
sotetsuk commented 6 months ago

After some investigation, I concluded that this is not the bug. This game ends due to PSK. However, it's allowed in the legal action and the agent is expected to learn it. This is our specification. This is described in our document: https://sotets.uk/pgx/go/

Also, I suppose OpenSpiel also has the similar behavior. I guess the game would end in tie in OpenSpiel.

Anyway, thank you four your report!

Nightbringers commented 6 months ago

Does the presence of dead stones on the board affect the win or lose in go game?

sotetsuk commented 6 months ago

It follows the Tromp Taylor rule https://tromp.github.io/go.html

Nightbringers commented 6 months ago

it says : A player's score is the number of points of her color, plus the number of empty points that reach only her color. So if dead stones not cleaned, it will counted into score, and affect the result?

Nightbringers commented 6 months ago

PSK

finanlly understand,thanks. This is the first time that i know this rule.