icecream17 / chesstree

The tree of chess
0 stars 0 forks source link

Storage efficiency #5

Open icecream17 opened 2 years ago

icecream17 commented 2 years ago

Trivial nodes

A trivial node is a node where a game is finished

It's wasteful to make folder/info if the folder has no children

Wasteful to make a separate file for a node that trivially ends, but if this is put in file, it conflicts with the equivalent node data.

And not counting trivial nodes as equivalency is possible, but then either a trivial node will never be counted or it will be counted multiple times.

icecream17 commented 2 years ago

Impossible equivalency

If there's only one way to get to a node, it doesn't need to be stored in the equivalency data

icecream17 commented 1 year ago

Shared equivalences

Unite very similar nodes and subtrees, with their differences noted. A form of compression.

Non-mattering histories

Unite nodes that are the same except for history if the history doesn't matter.

Fewer files and folders

Using 1 file is more computationally efficient than using many files and folders. It is probably more storage efficient because of vague metadata. Still, it is easier to open and read smaller files, so using 1 file won't happen, but using fewer files will.

Binary compression

Any character corresponds to 8 or more binary bits, which is not very memory efficient. Note that any more obfuscation by compression tools takes work, this is free compression. Ultimately improving the file/folder structure is better.

Power nodes

A minimax algorithm that calculates up to depth N can figure out a tablebase position without using a few nodes, thus saving memory.

Let N(node) be 0 if it is stored, else 1 + max(N(children)). If N=0, all nodes are stored. If N=1, some nodes don't have to be stored if all their children are. This already cuts the number of nodes in about half, for example by not storing positions that have an odd half-move counter.

This ruins any after-the-fact statistics and it will be much more complicated to keep track of things.

This makes retrieving nodes more computationally inefficient.

Start at endings

I probably won't be able to store every chess position so starting at the ending checkmates and going backwards, like the current tablebase generators, is probably best.

I would have to remove any illegal positions. For example, this double check is impossible:

........
Rk......
........
.R......

And this one isn't:

Rk......
........
........
.R......

This is more complicated when storing history...