Closed barakugav closed 2 years ago
Protobuf could be a good tool for this
I think protobuf is a little bit overkill for this. This is simple impl: https://github.com/poja/RL/pull/97
We may want to use protobuf to pass parameters from Python to Rust
Apparently this doesn't decrease the file sizes... The new format has a fixed size of 8k (in chess), while JSON format size depends on the data. In most cases most of the moves are illegal, and their probability is -1, which is only two bytes, and from what I saw most of the files sizes are 6k.
Not sure what to do here. Either formats is fine. If this becomes real issue, we can can change the format to the following (of raw bytes, not JSON):
{ planes: [u64: 18], moves_bitboard: [u64: 30], probs: [f32: 256], winner: i8 }
Instead of storing all 1880 moves probs, we store a bitmap of size 1880 which tell us which moves are included in the 'probs' array. This take advantage of the fact that no more than 256 are ever legal in the same position in chess. This will result in a fixed size entry of 1.5k bytes. This is slightly compicate things, but not much and it is very self contained
On a second thought, lets try and merge this feature (the simple raw bytes, not bitmap), in Hex this will have a bigger impact
https://github.com/poja/RL/pull/97/commits/5bc4d057b23c858069810c9773c7dd521a939cba 1280 bytes! Now that is more reasonable
We may want to use protobuf to pass parameters from Python to Rust
This is not the main subject of this thread, but still - I think textual formats are to be preferred where performance is not an issue (and I think this applies here)
The games directory size exceed 1G easily, by storing the raw bytes of the training data entries we can save a lot of space