huangeddie / GymGo

An environment of the board game Go using OpenAI's Gym API
167 stars 32 forks source link

Interest in super-ko? #13

Open RohanM opened 2 years ago

RohanM commented 2 years ago

I'm using this gym to experiment with an AlphaZero-like algorithm, starting with a very small board (2x2 or 3x3). In that context it's very easy to have games that result in multi-step cycles, which the current ko protection code doesn't detect.

As part of that project, I'm working on implementing super-ko, which would prevent such illegal moves. I'm using Zobrist hashing, so the implementation should be efficient. Would there be interest in incorporating that feature into this gym? If so, I'm happy to write it up as a PR (with tests etc), but I thought I'd check in first to see if that's the case.

huangeddie commented 2 years ago

Hi RohanM, I'm wondering how you would implement this Zobrist hashing. This codebase is largely reliant on numpy/scipy and I would prefer if your zobrist hashing used numpy/scipy if possible. Also I had a similar experience as you and my solution was to just end the games after a max number of steps. Have you considered that?

RohanM commented 2 years ago

I've considered the same solution (and that's what I'm using in prototypes at the moment). But I thought that tracking super ko would provide more information from each game (an actual winner rather than a draw), and also keep the games shorter, saving computation.

In terms of the implementation, Zobrist hashing is very simple - generate a 64 bit random integer for each piece in each position, and XOR it in and out of the hash as the board position changes. As such, it's easy to implement with just numpy, and no other dependencies would be required.

Anyway, thanks for getting back! I'll do a little more work on my prototype and post a link so you can take a look and see what would be involved.

huangeddie commented 2 years ago

I might not have the best understanding of Zobrist hashing but since you're working with small boards anyway. Why not just compare the board states directly? I doubt it would take significant memory and I feel like it would be just as fast?