mcostalba / chess_db

GNU General Public License v3.0
22 stars 5 forks source link

Tolerate bad data #1

Closed sshivaji closed 7 years ago

sshivaji commented 7 years ago

This project looks very promising, as the code is changing quickly, instead of posting a patch, I will post a request. Many PGNs have errors, instead of stopping on the first error can we have an option to ignore errors and keep going.

Example output on a large database:

Processing...Wrong black move: ') ( 9... d6 10. O-O Nbd7 11. Nh4 Bxg2 12. Nxg2 c5'

mcostalba commented 7 years ago

@sshivaji thanks for your interest. Yes we have to gracefully handle those cases before 1.0, but at the moment, for active development, I prefer to actually quickly see where things go wrong (because sometime PGN is good and is the parser that is bad).

Indeed I would be very interested in those positions, if you could post somewhere your difficult PGN files it would be great.

gbtami commented 7 years ago

To test PyChess parser i used .pgn files from http://www.angelfire.com/games3/smartbridge/ For example with chess_db parser I get: tamas@tami:~/PGN$ ./parser chessdoctor.pgn

Processing...Wrong header: 'Nd7 20. Nc6 Qb6+ 21. Rf2 Nf6 {White plays an exce'

and: tamas@tami:~/PGN$ ./parser GM_games.pgn

Processing...Wrong header: 'g6 6.b3 Bg7 7.Bb2 0-0 8.Nc3 { If White strikes f'

and with my hand made https://github.com/pychess/pychess/blob/master/testing/gamefiles/annotated.pgn tamas@tami:~/PGN$ ./parser ../pychess/testing/gamefiles/annotated.pgn

Processing...Wrong header: 'e4 $1 $24 {some comment on e4} e5! Nf3 { [clk 1:0'

mcostalba commented 7 years ago

@gbtami Thnaks for the links!

Indeed I have no problem with your annotated.pgn, instead I have some problems with chessdoctor.pgn due to missed braces { } around comments (this is really invalid!) and some problem with middleg.pgn due to usage of -- like

35... -- (35... h3 36. Rc3) 36. b6 Rxa6 37. b7 1-0

Can you please clarify me what the -- means?

Apart from that, it mostly works, especially after I have finally pushed a commit to correctly handle the 0-0-0 castling notation (zero instead of big O).

gbtami commented 7 years ago

-- is null move, used for threatening variations. It's not pgn standard, but used in Fritz by chessbase and known by several clients like winboard, scid-vs-pc, pychess, etc.

sshivaji commented 7 years ago

For reference, I have an adapted parser from polyglot code that works, but I much prefer this repo! Null moves are silently ignored with the adapted parser, https://github.com/sshivaji/polyglot/blob/leveldb/src/pgn.cpp and https://github.com/sshivaji/polyglot/blob/leveldb/src/parse.cpp

mcostalba commented 7 years ago

@sshivaji @gbtami chessdoctor.pgn is finally conquered!