sshivaji / pgnextractor

GNU General Public License v3.0
1 stars 1 forks source link

pgnextractor produces invalid JSON for games if tag has "[" inside #3

Open gbtami opened 7 years ago

gbtami commented 7 years ago

For example Dr. Hyatt enormous.pgn contains games like

[Event "open"]
[Site "Biel  55/652 [Golubev,M]"]
[Date "1992.??.??"]
[Round "?"]
[White "(USA), Gurevich Dmitry"]
[Black "Golubev, Mihail"]
[Result "0-1"]
[ECO "E99/01"]

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Be2 O-O 6. Nf3 e5 7. O-O Nc6 8. 
d5 Ne7 9. Ne1 Nd7 10. Be3 f5 11. f3 f4 12. Bf2 g5 13. Nb5 Nf6 14. Nxa7 Bd7
15. Nb5 Ng6 16. Qc2 h5 17. c5 g4 18. c6 bxc6 19. dxc6 Bc8 20. Bc4+ Kh8 21.
Nd3 g3 22. Be1 d5 23. exd5 Nxd5 24. Bd2 Qh4 25. h3 Bxh3 26. gxh3 Qxh3 27. 
Bc1 Nh4 28. Ne1 Rad8 29. a4 Ne3 30. Bxe3 fxe3 31. Be6 Qxe6 32. Rd1 Rxd1 33.
Qxd1 Qh3 34. Qe2 Rd8 35. f4 Rd2 36. Qxh5+ Kg8 37. Qxh4 Qxh4 0-1

[Event "ch"]
[Site "USA  56/712 [Byrne,R; Mednis,E]"]
[Date "1992.??.??"]
[Round "?"]
[White "(USA), Gurevich Dmitry"]
[Black "Sherzer, A."]
[Result "1-0"]
[ECO "E99/01"]

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Be2 O-O 6. Nf3 e5 7. O-O Nc6 8. 
d5 Ne7 9. Ne1 Nd7 10. Be3 f5 11. f3 f4 12. Bf2 g5 13. a4 Ng6 14. a5 Rf7 15.
c5 Nxc5 16. Bxc5 dxc5 17. Bc4 Kh8 18. a6 Rf6 19. axb7 Bxb7 20. Nd3 Bf8 21.
Ra5 Bc8 22. Nxc5 c6 23. b4 Rb8 24. Qa4 g4 25. fxg4 f3 26. gxf3 Nf4 27. Ne2
Nh3+ 28. Kh1 Bxg4 29. Ng1 Bxc5 30. Rxc5 Qf8 31. Qa1 Nxg1 32. fxg4 Rxf1 33.
Qxe5+ Qf6 34. Qxb8+ Kg7 35. Bxf1 Qxf1 36. Qe5+ Kg8 37. Qg5+ Kh8 38. Rc1 Qf2
39. Rxc6 Qf3+ 40. Kxg1 Qd1+ 41. Kf2 1-0

and pgnextractor produces:

{"Event": "open", "Site": "Biel  55/652 [Golubev,M, "Date": "1992.??.??", "Round": "?", "White": "(USA), Gurevich Dmitry", "Black": "Golubev, Mihail", "Result": "0-1", "ECO": "E99/01", "offset":945791445,"offset_8":7566331560}

{"Event": "ch", "Site": "USA  56/712 [Byrne,R; Mednis,E, "Date": "1992.??.??", "Round": "?", "White": "(USA), Gurevich Dmitry", "Black": "Sherzer, A.", "Result": "1-0", "ECO": "E99/01", "offset":945794598,"offset_8":7566356784}
gbtami commented 7 years ago

Similar problem (invalid JSON line) raising if a tag value contains " char inside. I'm trying to import .pgn from https://chessgamesrepository.wordpress.com/2017/02/20/cgr-20170219-full/

sshivaji commented 7 years ago

Now, it should work. Let me know if you still have issues.

gbtami commented 7 years ago
  1. No invalid JSON produced, thx!
  2. Some particular games from enormous.pgn still not produced in JSON output. chess_db parser counts 1496329 games. pychess pgn import using own python parser agrees with it (just drops 2 game with invalid result tag) while pychess pgn import using pgnextractor produces only 1496070 imported games. I can imagine that pychess regexp based tag header parser is more relaxed than pgnextractor ones. Maybe later I will try to find what games causing the difference.

enormous.pgn download mirrors: http://www.filewatcher.com/m/enormous.pgn.gz.282114686-0.html

sshivaji commented 7 years ago

Thx, will try out enormous.pgn and debug too when I get a chance.