tensorflow / minigo

An open-source implementation of the AlphaGoZero algorithm
Apache License 2.0
3.47k stars 558 forks source link

Support for sending board state to the engine via GTP #985

Closed smolendawid closed 4 years ago

smolendawid commented 4 years ago

Hello I am looking for some documentation or tips on the GTP protocol. Especially I'm wondering how is it possible to send a current state of the board to the engine and ask it for the best moves without knowing the history of the game. Like looking at the board in some moment and saying "this move is the best for black right now".

Is it possible with minigo engine? If yes, how can I send the info about f.e. ko? Or info about whose move is it - black or white?

amj commented 4 years ago

The inputs to the neural-network model require 8 moves of history, so the engine can understand kos, etc. So if you're using the GTP protocol, the best thing is to push the SGF history in via play move commands and then use genmove

the script here: https://github.com/tensorflow/minigo/blob/master/oneoffs/position_pv.py#L45 does some of what you want. It loads many sgfs, takes the final position in them, and asks a set of models for the 'principal variation' from that point.

tommadams commented 4 years ago

The GTP spec is here: http://www.lysator.liu.se/~gunnar/gtp/gtp2-spec-draft2/gtp2-spec.html

Unfortunately, all AlphaGo-style engines need at least a partial history of the most recent N moves to accurately suggest a next move. The Minigo model uses the most recent 8 board positions, which is how it handles ko, triple ko, etc. The model doesn't have an explicit ko input, it learns ko rules by example (though the tree search does enforce ko rules).

When using Minigui to suggest the best move for an arbitrary position in an SGF, the engine builds this move history by clearing the board and then replaying moves until it reaches the desired position. This is a common approach in many engines that use the GTP protocol (even engines that aren't based on AlphaGo).

Can you explain a little more about what you're trying to accomplish?

smolendawid commented 4 years ago

@amj thanks for pointing out the function. Unfortunately, the documentation is very laconic.

What I do:

@tommadams I recorder the video that describes my problem and the current state: https://www.youtube.com/watch?v=yBu3X1Ykh_w

Basically it works awesome, I have fun playing against minigo and especially I'm automatically creating the tagged dataset of images. I like the fact that I can add an eye for myself wherever I like ;) It's not resistant to KO, but for me it's ok; unfortunately, it's possible it's not playing the best it can, does it? The 8 past moves are messed up.

by clearing the board and then replaying moves until it reaches the desired position.

This is interesting, so this is implemented? Can you explain it? I can't imagine right now how to replay the moves to reach the desired position, it seems like some huge search.

tommadams commented 4 years ago

That's awesome! Thanks for sharing the video, it's a really cool demo. It also very clearly illustrates your problem.

We don't have any use cases where we need to start from a completely arbitrary position, in all cases we know the entire move history that lead up to a particular position. This comes from either loading an SGF file via the loadsgf command or from interactions with a user in Minigui. In the Minigui case, we maintain a tree of all variations ever played during a session and the user is free to click around on that tree to jump between previously played positions. Whenever the user clicks, we clear the board and then play the sequence of moves to reach the clicked node in the tree. You can see a portion of the tree near the top right of this Minigui screenshot: image

You could have some limited support for trying different variations using the undo GTP command. When you want to try a different variation, you would undo several moves, then play out variation start. For the specific example in your video, you'd undo twice, then play the black stone.

Unfortunately, it looks like we haven't implemented the undo command in the Python engine yet: https://github.com/tensorflow/minigo/blob/77ed838c6b533d35016995a61ccb0689a960100c/gtp_cmd_handlers.py#L96

I don't suppose you're using the C++ engine are you? If not and you think the "undo" approach would work, please file a github issue and I'll try and get to implementing it this week.

brilee commented 4 years ago

FWIW, repeatedly passing in the same position 8 times for the history plane kind of works as a hack for starting at arbitrary positions. I don't know if any of the python code does that; it used to be that the history planes were delta-encoded, meaning that the default of a zero-filled plane served as a "repeat same position 8 times for history" hack. But I think they're redundantly encoded now, which means that the history would have to be manually repeated.

On Mon, Mar 30, 2020 at 4:51 PM Tom Madams notifications@github.com wrote:

That's awesome! Thanks for sharing the video, it's a really cool demo. It also very clearly illustrates your problem.

We don't have any use cases where we need to start from a completely arbitrary position, in all cases we know the entire move history that lead up to a particular position. This comes from either loading an SGF file via the loadsgf command or from interactions with a user in Minigui. In the Minigui case, we maintain a tree of all variations ever played during a session and the user is free to click around on that tree to jump between previously played positions. Whenever the user clicks, we clear the board and then play the sequence of moves to reach the clicked node in the tree. You can see a portion of the tree near the top right of this Minigui screenshot: [image: image] https://user-images.githubusercontent.com/7587269/77956668-1e234200-7287-11ea-9644-71b969492b9b.png

You could have some limited support for trying different variations using the undo GTP command. When you want to try a different variation, you would undo several moves, then play out variation start. For the specific example in your video, you'd undo twice, then play the black stone.

Unfortunately, it looks like we haven't implemented the undo command in the Python engine yet: https://github.com/tensorflow/minigo/blob/77ed838c6b533d35016995a61ccb0689a960100c/gtp_cmd_handlers.py#L96

I don't suppose you're using the C++ engine are you? If not and you think the "undo" approach would work, please file a github issue and I'll try and get to implementing it this week.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/minigo/issues/985#issuecomment-606242898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCKFROS7BRXLS7ZUGYBVTRKEA5XANCNFSM4LWS7H5A .

smolendawid commented 4 years ago

Thanks for the trick @brilee @tommadams I think that undo function would be useful to me but it's not enough. I suspect I have to create some engine that will interpret the state of the current board and somehow replay it in a sensible, possible way, with the ability at the same time to remember the history, undo moves, and be somewhat resistant to ko, everything on the side of my program. It's a pretty big project and I'm not sure it's worth doing before I collect a huge database of annotated GO boards images, which Is my priority right now. Right now my recognizer has problems with shadows and changing lights but I think it could be robust to that. Nevertheless thanks for the feedback. I think it's a very solid repository and Minigo community is very helpful!