niklasf / python-chess

A chess library for Python, with move generation and validation, PGN parsing and writing, Polyglot opening book reading, Gaviota tablebase probing, Syzygy tablebase probing, and UCI/XBoard engine communication
https://python-chess.readthedocs.io/en/latest/
GNU General Public License v3.0
2.42k stars 529 forks source link

Weird value error #701

Closed PhilippBongartz closed 3 years ago

PhilippBongartz commented 3 years ago

First let me express my appreciation for a great python module.

I use it to create trainingsdata for a Deep Learning project. I parse games from a 7 million game database of tournament games. Some of these games throw Value Errors, mostly "illegal san". This is not too suprising and I thought I would have to filter the games a little bit.

But when I tried to take a look at the first game that evokes this error, I realised that parsing the game only results in an error when I parse all games. Parsing only the problematic game does not result in an error.

So it seems to be a software issue, maybe multithreading gone wrong, I don't know. I would appreciate it if you could give me a pointer how to avoid these errors.

niklasf commented 3 years ago

Thanks :)

chess.pgn does not use multi-threading internally, so unless you provide the problematic PGN and the code you use to parse it, I have nothing to go on.

PhilippBongartz commented 3 years ago
import chess
import chess.pgn

with open('OTB.pgn',encoding='latin-1') as database:
    header = chess.pgn.read_headers(database)
    for t in range(719):

        if 1:
            current_game = chess.pgn.read_game(database)
            current_game_moves = [move for move in current_game.mainline_moves()]

        header = chess.pgn.read_headers(database)

When I run this there is no error. When I run it until game 720, I get:

error during pgn parsing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1638, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1021, in parse_san
    return board.parse_san(san)
  File "/usr/local/lib/python3.7/site-packages/chess/__init__.py", line 2987, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'dxe5' in rnbq1rk1/ppp1ppbp/3p1np1/8/2PPP3/2N2N2/PP2BPPP/R1BQK2R b KQ - 3 6
error during pgn parsing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1638, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1021, in parse_san
    return board.parse_san(san)
  File "/usr/local/lib/python3.7/site-packages/chess/__init__.py", line 2987, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Be3' in rnbq1rk1/ppp1ppbp/3p1np1/8/2PPP3/2N2N2/PP2BPPP/R1BQK2R b KQ - 3 6
error during pgn parsing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1638, in read_game
    move = visitor.parse_san(board_stack[-1], token)
  File "/usr/local/lib/python3.7/site-packages/chess/pgn.py", line 1021, in parse_san
    return board.parse_san(san)
  File "/usr/local/lib/python3.7/site-packages/chess/__init__.py", line 2987, in parse_san
    raise ValueError(f"illegal san: {san!r} in {self.fen()}")
ValueError: illegal san: 'Bh4' in rnbq1rk1/ppp1ppbp/3p1np1/8/2PPP3/2N2N2/PP2BPPP/R1BQ1RK1 w - - 6 8

If I replace if 1: with if t==719, I get not error.

I printed the moves and played through the games: Seems normal:

['d2d4', 'g8f6', 'g1f3', 'e7e6', 'c2c4', 'c7c5', 'g2g3', 'c5d4', 'f3d4', 'd8c7', 'b1d2', 'b8c6', 'd4b5', 'c7b8', 'f1g2', 'a7a6', 'b5c3', 'b7b5', 'e1g1', 'f8e7', 'a2a3', 'e8g8', 'b2b4', 'c8b7', 'c1b2', 'b5c4', 'd2c4', 'd7d5', 'c4b6', 'a8a7', 'c3a4', 'f8d8', 'd1b3', 'c6e5', 'f1d1', 'b7c6', 'a1c1', 'c6b5', 'b6c8', 'd8c8', 'b2e5', 'b8b7', 'e5d4', 'a7a8', 'g2f3', 'f6e4', 'a4b6', 'c8c1', 'd1c1', 'a8d8', 'a3a4', 'b5e8', 'f3e4', 'd5e4', 'd4c5', 'e7f6', 'c1d1', 'h7h5', 'h2h4', 'b7c6', 'b3b1', 'a6a5', 'd1c1', 'a5b4', 'b1b4', 'f6e7', 'a4a5', 'e4e3', 'f2f3', 'e7c5', 'c1c5', 'c6d6', 'g1g2', 'd6d1']
Headers(Event='Spartakiada URS, Moscow RUS', Site='Spartakiada URS, Moscow RUS', Date='1979.??.??', Round='5', White='(LTU), Piesina Gintautas', Black='Ivanov, Igor V', Result='1-0', BlackElo='2415', ECO='A15', EventDate='1979.??.??', PlyCount='68', Source='ChessliB', SourceDate='2003.01.21')
PhilippBongartz commented 3 years ago

Though header and game seem not to fit together, now that I think about it. If I read the header and then the game, do I get the game for the header or already the next game?

PhilippBongartz commented 3 years ago

Ah, ok, I was confused about reading the header and reading in the game. That's why I couldn't locate the correct game. Ok, let me investigate a bit on my own and if I still have questions I'll come back to you.

PhilippBongartz commented 3 years ago

Ok, now the troublemaker is game number 1439. Frankly, I would be confused too, with all those brackets. I guess I'll kick out all games with an annotator and see what happens. Thanks for your quick reply.

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "1. Alternatives to 7 0-0"]
[Black "?"]
[Result "*"]
[Annotator "Joe Gallagher"]
[ECO "E92a"]
[PlyCount "13"]
[Source "Everyman Chess"]
[SourceDate "2006.10.01"]

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 { In the Classical
Variation White just concentrates on developing his pieces to sensible
squares. 6 Be2 makes more sense than 6 Bd3 as the latter interferes with
the protection of the weakest point in the white camp, the d4-square.
NOTE: The d4-square is slightly weak because having played e4 and c4 this
square can no longer be protected by pawns. In many variations of the
King's Indian Black will form a plan whose aim is to occupy or control
this square. Although the d4-square is weak, or potentially weak, that
does not mean that White has made a mistake playing the moves e4 and c4 as
they have given him a powerful centre and a space advantage. Chess is all
about give and take. This is by far the most important variation in the
King's Indian. That is why it takes up four of the ten chapters in this
book. } 6... e5 $1 { ( After 6...e5 White usually plays 7. O-O and this is the subject of
    the next three chapters. This chapter deals with the alternatives: } ( { This is an important move. Black stakes his claim in the centre. At
    first sight it may appear that White can just win a pawn but you can
    read all about this in the Exchange Variation below. The alternatives
    deserve a quick mention. 1) } 6... Nbd7 { is played sometimes by those
    who fear the exchange of queens. It is however a lot less flexible and
    after } 7. O-O e5 { we have transposed to Chapter 2. } ) ( { 2) } 6... Na6 { is playable, again transposing to Chapter 2 after } 7. O-O e5 { . } ) ( { 3) } 6... c5 { is an alternative strike in the centre but Black
    will have to be willing to play a Benoni after 7 d5 or a Sicilian
    after } 7. O-O cxd4 8. Nxd4 { . Very few players play the King's Indian
    in order to play 6...c5. } ) ( { 4) } 6... Bg4 { is a solid line which aims to exchange the bishop
    for the knight on f3. The idea of this is to slacken White's control
    of d4. Again it is not really in the style of the King's Indian. A
    real King's Indian player will preciously guard his light-squared
    bishop until he can sacrifice it on h3. } ) ( 6... d5 ) ( 6... -- 7. O-O -- ) *