jtauber / sgf

Python implementation of Smart Game Format
MIT License
31 stars 6 forks source link

subtrees and variations #2

Closed wrongu closed 8 years ago

wrongu commented 8 years ago

Not an 'issue' per se but a general confusion with how more complex files with variations are handled.

First, what are the children of a GameTree?

Second, for the sake of clarity it would be great to see an example of parsing (and interpreting) a file that has variations. Working through the print1.sgf example you provide, I found that GameTree.nodes[i].variations[j].parent.nodes is one way to see a full variation - is there a more direct way?

More to the point: if I just look at the children of a Collection and their nodes those should be all of the moves of the game excluding variations, right?

jtauber commented 8 years ago

Just re-familiarizing myself with the code and the format so I'll get back to you shortly.

jtauber commented 8 years ago

So the output of the library seems to be a very literal abstract syntax tree of the SGF format.

The grammar of the SGF format down to the node level is:

Collection = GameTree { GameTree }
GameTree   = "(" Sequence { GameTree } ")"
Sequence   = Node { Node }

In other words, a collection is a list of one or more games trees and a game tree is both a sequence of one or more nodes AND a list of zero or more game trees. So that's why a GameTree has both nodes AND children.

Looking at the first example in the SGF FF[4] spec, the list of nodes is the sequence of moves up until there are variations and then each variation is in one of the children.

So consider a very simple SGF game tree fragment: (;C[a];C[b](;C[c])(;C[d];C[e]))

The means a then b then two variations. The first variation continues with c and the second with d then e.

So the current output of my library for that would be

GameTree
    nodes:
        a
        b
    children:
        GameTree
            nodes:
                c
            children:
        GameTree
            nodes:
                d
                e

If there are no variations, the GameTree should just have a sequence of nodes with no children. However if there are variations, one way to step through the main line is just iterate over the nodes then recurse into the first child and iterate over the nodes and so on.

There are, however, additional attributes on each node which provide useful information and a simpler alternative way of iterating over nodes...

So an alternative way to step through the mainline is just to follow next on each node.

I hope this helps. I might update the README docs to better explain with this example.

There are definitely things I could add to the API to make it cleaner / easier to use. For example, I could make Collection implement the iterator protocol over GameTrees and make GameTrees implement the iterator protocol over nodes following next (and hence the mainline). Because a Collection just has one type of child, I could also implement __getitem__ so collection[n] would return the n-th game.

With this you could iterate the mainline of the first game in a collection as simply as:

for node in collection[0]:
    ...
jtauber commented 8 years ago

I've released 0.2 with pretty much all the API changes mentioned at the end of my previous comment.

See the README.

wrongu commented 8 years ago

Looking good! For the purposes of AlphaGo especially, variations aren't a concern and I'd be happy to get rid of sgflib now.

The only thing that I still find confusing about them, though, is that GameTree.children don't contain information about which node they are variations on (right?). Taking your (;C[a];C[b](;C[c])(;C[d];C[e])) example, the two variations c and de are essentially alternatives to b. I'd almost expect to see children be an array parallel to and of the same size as nodes, so that children[i] is a variation on nodes[i]. Or, I'd expect variations to be stored in the nodes themselves like nodes[i].variations = [GameTree, GameTree].

Maybe I just don't "get" the SGF specification, but that's my two cents =)

jtauber commented 8 years ago

c and de aren't alternatives to b, they are alternatives to each other, both following from b.

In other words the two "games" are: abc and abde.

Does that address the confusion?

jtauber commented 8 years ago

Here's a diagram from the spec which might help further:

ta3

jtauber commented 8 years ago

@wrongu if you're happy with this explanation, I'll close this ticket