rpdelaney-archive / python-chess-annotator

Reads chess games in PGN format and adds annotations using an engine
GNU General Public License v3.0
62 stars 29 forks source link

Input file processing should continue if an individual game cannot be parsed #9

Open rpdelaney opened 6 years ago

rpdelaney commented 6 years ago

Background python-chess is generally rather lax about enforcing consistency in the objects it provides. Illegal inputs are typically accepted without raising an exception, even if the call corrupts the working board, GameNode, or Game. (My understanding is that this is for a combination of reasons, including performance and general flexibility.)

Its PGN parsing behavior follows this philosophy: if python-chess fails to parse a PGN game, it will still attempt to create and return a Game, even if that Game includes illegal headers, illegal moves, or other garbage. However, it will also attempt to store a list of parsing errors, which can be retrieved in Game.errors.

python-chess-annotator loops through the games in a PGN file. Because python-chess does not raise exceptions on PGN parsing errors, and to avoid burning up CPU time on corrupted games (which often leads to an engine crash or other nasty stuff), it performs a check for parsing errors using checkgame(). This function checks for Game.errors and raises a RuntimeError if any are found. Additionally, it checks that Game.end().parent is not None (a common corruption state even when Game.errors is None), and raises another RuntimeError if that check fails.

Note that I don't consider this an exhaustive list of methods for verifying that a Game is clean, as I don't fully understand all the ways that a Game can be dirtied up. These are just the two most common that I've found and they seem to cover the great majority of cases.

Problem The intended behavior of python-chess-annotator is to analyze all the games in a given PGN file, returning annotated games one by one. However, if any one of the games in the file cannot be parsed, python-chess-annotator will halt with a RuntimeError when it is encountered. That may frustrate a user who sets up analysis of a large PGN file only to return later and find that the job was partially completed.

Preferably, python-chess-annotator should skip over the unparseable game and continue to the next one. Ideally, it would print some debugging info via logger.critical so that the user is notified that there is a problem with their PGN file.

Solution I think the way to achieve this is to define a custom PgnParseError exception, and refactor checkgame() to rase that instead of a RuntimeError. Then the the try/except block in the main() function would be refactored to catch the PgnParseError, print the debugging info, and continue the loop.

ddugovic commented 6 years ago

I am interested in improving parsing of invalid or ambiguous moves (I hope one day to use OCR to recognize moves on a scoresheet) but I lack examples.

rpdelaney commented 6 years ago

That would be a good thing to work on, but any such improvements should be made to python-chess.

rpdelaney commented 6 years ago

Also, the same idea should apply to variant games that aren't supported by the engine. We should print a log warning and skip over those games rather than halting with an exception.