mcostalba / chess_db

GNU General Public License v3.0
22 stars 5 forks source link

parser failed on millionbase.pgn from http://www.top-5000.nl/pgn.htm #2

Closed gbtami closed 8 years ago

gbtami commented 8 years ago

tamas@tami:~/PGN$ ./parser /mnt/win7/PGN/millionbase-2.22.pgn terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Félbeszakítva (core készült)

mcostalba commented 8 years ago

@gbtami I have run that file and it was OK (with current master that knows about FEN).

In your case the problem is that you don't have enough RAM: the tool allocates the double of file size, see here:

https://github.com/mcostalba/chess_db/blob/master/parser/parser.cpp#L547

Eventually you can reduce it to 1.5 with:

kTable.reserve(3 * size / 2 / sizeof(PolyEntry));

gbtami commented 8 years ago

You are right. I'v tested on ThinkPad T510 with 4Gb RAM. After closing some pograms(Firefox, Jin, etc) parsing went OK.

sshivaji commented 8 years ago

It parses the 2.2 million game PGN fast (great!!). However, something is wrong on the output. Only half the games are parsed, the parser says 1.2M games are in the database when it should be 2.2M games.

Detailed parser output attached below

Games: 1201642 Moves: 90621105 Incorrect moves: 0 Unique positions: 73% Games/second: 119364 Moves/second: 9001798 MBytes/second: 146.342 Size of index file (MB): 1093001872 Book file: /home/shiv/Downloads/millionbase-2.22.bin Processing time (ms): 10067

mcostalba commented 8 years ago

Can you please confirm the games number using another parser, like scid for example? Thanks.

On Tuesday, November 8, 2016, Shivkumar Shivaji notifications@github.com wrote:

It parses the PGN (great!). However, something is wrong on the output. Only half the games are parsed, the parser says 1.2M games are in the database when it should be 2.2M games.

Detailed parser output attached below

Games: 1201642 Moves: 90621105 Incorrect moves: 0 Unique positions: 73% Games/second: 119364 Moves/second: 9001798 MBytes/second: 146.342 Size of index file (MB): 1093001872 Book file: /home/shiv/Downloads/millionbase-2.22.bin Processing time (ms): 10067

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/2#issuecomment-259213154, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGAb3z3bm-bWTy4yHLJlPR3yzK5-sSks5q8LqWgaJpZM4Kqn5P .

sshivaji commented 8 years ago

Scid confirms that it is 2,197,188 games (2.2M games)

On Tue, Nov 8, 2016 at 7:52 PM, Marco Costalba notifications@github.com wrote:

Can you please confirm the games number using another parser, like scid for example? Thanks.

On Tuesday, November 8, 2016, Shivkumar Shivaji notifications@github.com wrote:

It parses the PGN (great!). However, something is wrong on the output. Only half the games are parsed, the parser says 1.2M games are in the database when it should be 2.2M games.

Detailed parser output attached below

Games: 1201642 Moves: 90621105 Incorrect moves: 0 Unique positions: 73% Games/second: 119364 Moves/second: 9001798 MBytes/second: 146.342 Size of index file (MB): 1093001872 Book file: /home/shiv/Downloads/millionbase-2.22.bin Processing time (ms): 10067

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/2#issuecomment-259213154, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGAb3z3bm- bWTy4yHLJlPR3yzK5-sSks5q8LqWgaJpZM4Kqn5P .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/2#issuecomment-259328800, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXUGlhT1dpCgph1cR1E1Vmiax4uxnbBks5q8UOVgaJpZM4Kqn5P .

sshivaji commented 8 years ago

Interestingly, this problem still exists after that commit. Will post if I have more insight into why only 1.2M games are being scanned.

mcostalba commented 8 years ago

I'd suggest to bisect the big file: cut in 2 and check again the parts and so on until culprit is found.

See http://stackoverflow.com/questions/2016894/how-to-split-a-large-text-file-into-smaller-files-with-equal-number-of-lines

On Thu, Nov 10, 2016 at 5:41 AM, Shivkumar Shivaji <notifications@github.com

wrote:

Interestingly, this problem still exists after that commit. Will post if I have more insight into why only 1.2M games are being scanned.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mcostalba/chess_db/issues/2#issuecomment-259600324, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDGAVe7u9zqjISqICNRuhdaUnRimf-Lks5q8qCEgaJpZM4Kqn5P .

sshivaji commented 8 years ago

This works now, I just tested! I think this particular issue can be closed.