mcostalba / chess_db

GNU General Public License v3.0
22 stars 5 forks source link

Get all game headers in find command #28

Closed sshivaji closed 7 years ago

sshivaji commented 7 years ago

Its a common need to sort games by date etc. I thought of using different code for this. However, I need to align against the returned game offset.

Can we have an api to get all game headers that can be returned by the find command. I dont see any other performant way to get all game headers and align it with the returned chess_db game offsets. This can also be done during book generation.

Code in python to get all game headers using all of the offsets is really slow..

sshivaji commented 7 years ago

On second thought and testing, it looks like I can use https://python-chess.readthedocs.io/en/v0.16.1/pgn.html#chess.pgn.scan_headers to get all headers. Its not as fast as chess_db book building but perhaps sufficient.

This is sufficient if the offset logic matches. I think it should.

gbtami commented 7 years ago

The thing is we have two tools creating different kind of database now from .pgn move sections and they store different calculated offsets of games. Maybe it would be more practical if scoutfish calculate offsets the same way as chess_db (8 byte aligned). Then chess.pgn.scan_headers can be used to produce an mongodb/sqlite database with chess_db compatible offsets as primary keys and other field(s) to store header data. This way offsets coming from both chess_db and scoutfish would be used to query this sqlite file for header data.

gbtami commented 7 years ago

Another solution can be extending the parser to parse header tags too and output it to say a mongodb or sqlite file.

sshivaji commented 7 years ago

I talked about this before, if we are going to output header tags, I think JSON format is sufficient, then other programs can process it as they wish.

sshivaji commented 7 years ago

This is related to the scoutfish issue. Will close it for now but will re-open with a real UI issue and screenshot.. :)

sshivaji commented 7 years ago

I was struggling to make this work fast enough with python-chess, it took about 4 minutes to get all headers in the 2.2 million base pgn file. I decided to modify the chess_db code and create a separate repo that only supports header and offset extraction. It is now about 8 times faster than the python-chess solution (not surprising as the code is in C++) - https://github.com/sshivaji/chess_pgn_headers

@gbtami, the output is in JSON and you can use it to put in SQLite or wherever you desire

This solves the issue of executing many typical queries with scoutfish (and chess_db), such as find me winning percentage of 2 bishops vs bishop and knight endgames in games where both players are greater than 2400 elo, 2500 elo, 2600 elo and so on. How does elo affect the winning percentage? I think the greater the elo the greater the winning percentage of 2 bishops vs bishop and knight endgame (but I will have to check).

Now, I think we can mostly focus on the UI integration, all the key backend tools are ready.

@mcostalba, I hope you dont mind the fork. I think your code base is the fastest way to extract headers that I am aware of today, and reusing it helps maintain the speed.

mcostalba commented 7 years ago

The fork is OK with me.

On Sunday, January 1, 2017, Shivkumar Shivaji notifications@github.com wrote:

I was struggling to make this work fast enough with python-chess, it took about 4 minutes to get all headers in the 2.2 million base pgn file. I decided to modify the chess_db code and create a separate repo that only supports header and offset extraction. It is now about 8 times faster than the python-chess solution (not surprising as the code is in C++) - https://github.com/sshivaji/chess_pgn_headers

@gbtami https://github.com/gbtami, the output is in JSON and you can use it to put in SQLite or wherever you desire

This solves the issue of executing many typical queries with scoutfish (and chess_db), such as find me winning percentage of 2 bishops vs bishop and knight endgames in games where both players are greater than 2400 elo, 2500 elo, 2600 elo and so on. How does elo affect the winning percentage? I think the greater the elo the greater the winning percentage of 2 bishops vs bishop and knight endgame (but I will have to check).

Now, I think we can mostly focus on the UI integration, all the key backend tools are ready.

@mcostalba https://github.com/mcostalba, I hope you dont mind the fork. I think your code base is the fastest way to extract headers that I am aware of today, and reusing it helps maintain the speed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/28#issuecomment-269905362, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGAQJmpkzlLI4m6BGse5cUzJQeCJfRks5rN7qPgaJpZM4LSuGV .

sshivaji commented 7 years ago

Thanks!

I changed the repo name to https://github.com/sshivaji/pgnextractor and the executable name to pgnextractor to keep it different from parser.

On Sun, Jan 1, 2017 at 11:01 AM, Marco Costalba notifications@github.com wrote:

The fork is OK with me.

On Sunday, January 1, 2017, Shivkumar Shivaji notifications@github.com wrote:

I was struggling to make this work fast enough with python-chess, it took about 4 minutes to get all headers in the 2.2 million base pgn file. I decided to modify the chess_db code and create a separate repo that only supports header and offset extraction. It is now about 8 times faster than the python-chess solution (not surprising as the code is in C++) - https://github.com/sshivaji/chess_pgn_headers

@gbtami https://github.com/gbtami, the output is in JSON and you can use it to put in SQLite or wherever you desire

This solves the issue of executing many typical queries with scoutfish (and chess_db), such as find me winning percentage of 2 bishops vs bishop and knight endgames in games where both players are greater than 2400 elo, 2500 elo, 2600 elo and so on. How does elo affect the winning percentage? I think the greater the elo the greater the winning percentage of 2 bishops vs bishop and knight endgame (but I will have to check).

Now, I think we can mostly focus on the UI integration, all the key backend tools are ready.

@mcostalba https://github.com/mcostalba, I hope you dont mind the fork. I think your code base is the fastest way to extract headers that I am aware of today, and reusing it helps maintain the speed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/mcostalba/chess_db/issues/28#issuecomment-269905362 , or mute the thread https://github.com/notifications/unsubscribe-auth/ ABDGAQJmpkzlLI4m6BGse5cUzJQeCJfRks5rN7qPgaJpZM4LSuGV .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/28#issuecomment-269910540, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXUGoAUuQOxG-A89jtDsB2yyAJMWF6Mks5rN9vngaJpZM4LSuGV .