Why does zobrist_hash ignore legality of en passant captures?

niklasf / python-chess

A chess library for Python, with move generation and validation, PGN parsing and writing, Polyglot opening book reading, Gaviota tablebase probing, Syzygy tablebase probing, and UCI/XBoard engine communication

https://python-chess.readthedocs.io/en/latest/

GNU General Public License v3.0

2.42k stars 529 forks source link

Why does zobrist_hash ignore legality of en passant captures? #144

Closed dubiousjim closed 7 years ago

dubiousjim commented 7 years ago

I got stung recently by the fact that the following two variations:

1.Nf3 Nf6 2.c4 Nc6 3.Qb3 e5 4.Qe3 e4 5.d4
1.c4 e5 2.d4 Nc6 3.Qd3 Nf6 4.Nf3 e4 5.Qe3

though they have identical epds, have different zobrist_hashes. The reason is that for the first variation, the computation of the hash uses the position's ep_square, even though the en passant capture is illegal. (Black's e pawn is pinned.) I want these two variations to count as transpositions of each other, so I needed to redefine the zobrist_hash function to take account of the en passant square only when the e.p. capture is legal.

What's the justification for the existing behavior? Just a small performance boost?

Mk-Chan commented 7 years ago

The EPDs shouldn't be identical as the first line not only will have the ep square recorded but will also have a lower halfmove count.

While the halfmove count is not relevant to the zobrist hash, the existence of the ep square is, even if the ep is not immediately possible.

dubiousjim commented 7 years ago

The halfmove count is included in the FEN but not in the EPD.

I realize that the ep square is taken account of in the current implementation of the hash; my question is why, when the ep capture is illegal? (And if the ep is not "immediately" possible, it's not possible at all, right? The opportunity to make an en passant capture doesn't persist.) Do other polyglot opening book routines also implement the hash in this way?

Offhand, it seems like these variations should count as transpositions (I don't claim they're very good variations, they're just pulled from a list of cases where I got stung by this bug). But with the current hash implementation, they end up getting different hashes.

Mk-Chan commented 7 years ago

No they are different precisely because the existence of the enpassant square discounts the position from a threefold repetition draw

dubiousjim commented 7 years ago

Thanks for your quick replies. What is the precedent for this interpretation of the rules? I hadn't encountered it before. Checking the current FIDE handbook doesn't seem to support that intepretation:

9.2.1 | The game is drawn, upon a correct claim by a player having the move, when the same position for at least the third time (not necessarily by a repetition of moves): 9.2.1.1 | is about to appear, if he first writes his move, which cannot be changed, on his scoresheet and declares to the arbiter his intention to make this move, or 9.2.1.2 | has just appeared, and the player claiming the draw has the move. 9.2.2 | Positions are considered the same if and only if the same player has the move, pieces of the same kind and colour occupy the same squares and the possible moves of all the pieces of both players are the same. Thus positions are not the same if: 9.2.2.1 | at the start of the sequence a pawn could have been captured en passant 9.2.2.2 | a king had castling rights with a rook that has not been moved, but forfeited these after moving. The castling rights are lost only after the king or rook is moved.

Note it says "could have been captured en passant." I assume that when such a capture would not have been legal, it's not correct that the pawn "could have been captured en passant."

I've checked the source of the polyglot program at http://hgm.nubati.net/cgi-bin/gitweb.cgi?p=polyglot.git;a=summary, and offhand, that program, like python-chess, seems to ignore the legality of the capture when calculating the zobrist hash.

Mk-Chan commented 7 years ago

I think it makes sense as 'being enpassant-ed' is a property of the pawn that just moved rather than the pawn that could en passant because as it says 'a pawn could have been captured en passant' instead of 'a pawn could have captured another en passant'. I think it's for this reason the ep square is recorded even if there are no pseudolegal enpassants possible(no pawn being around)

dubiousjim commented 7 years ago

Could it even ever make a difference to the availability of a three-move repetition draw? I'm not sure that there could be a repetition in the same game where one of the positions had even an illegal en passant square.

I was checking to see whether the boards at the end of these positions were counted as equal or not, and if not why. Bizarrely:

fen1='r1bqkb1r/pppp1ppp/2n2n2/8/2PPp3/4QN2/PP2PPPP/RNB1KB1R b KQkq - 0 5'
fen2='r1bqkb1r/pppp1ppp/2n2n2/8/2PPp3/4QN2/PP2PPPP/RNB1KB1R b KQkq - 1 5'
b1, b2 = Board(fen1), Board(fen2)
(b1==b2, BaseBoard.__eq__(b1, b2), b1!=b2, BaseBoard.__ne__(b1, b2))

returns: (False, False, True, False).

It's bizarre that BaseBoard.__eq__ and BaseBoard.__ne__ are both returning False, especially since if you look at their implementations, __eq__ just returns the negation of __ne__ (unless __ne__ returned NotImplemented). Also I don't understand why != is giving a different result than BaseBoard.__ne__.

This is from python-chess around version 0.18.4.

dubiousjim commented 7 years ago

Here is BaseBoard.__eq__:

    def __eq__(self, board):
        ne = self.__ne__(board)
        return NotImplemented if ne is NotImplemented else not ne

dubiousjim commented 7 years ago

Never mind the stuff about __ne__. Board is overriding BaseBoard's implementation of __ne__. I did check to see if it was doing so, but somehow missed it.

dubiousjim commented 7 years ago

The book_format.html file included with the Polyglot program linked to above says:

<h3>enpassent</h3> If the opponent has performed a double pawn push
and there is now a pawn next to it belonging to the player to move then
"enpassant" is the entry from RandomEnPassant whose offset is the file
of the pushed pawn (counted from 0(=a) to 7(=h)). If this does not
apply then enpassant=0.
<p>
Note that this is <b>different from the FEN standard</b>. In the FEN standard
the presence of an "en passant target square" after a double pawn push
is unconditional. 
<p> Also note that it is irrelevant if the potential en passant
capturing move is legal or not (examples where it would not be legal are
when the capturing pawn is pinned or when the double pawn push was a
discovered check).

But this "it is irrelevant" doesn't seem to be describing anything other than that program's actual implementation. I guess there is some value to having python-chess use the same implementation. But it does still seem to me to be the wrong design choice (for both programs).

niklasf commented 7 years ago

[x] (1) Yes, this is the exact reason it's implemented like that. Otherwise the Zobrist hash could not be used to read polyglot opening books. It's commonly done this way for hashtables in engines, but I agree that it's a poor design decision for opening books, where the extra little bit of performance is not important. TODO: Provide both kinds of zobrist hashes.
[x] (2) The rules on three-fold repetition seem to be clear: Ep squares should be considered if and only if there's actually a legal ep capture. TODO: Fix python-chess.
[x] (3) I am not sure about Board.__eq__ and __ne__, i.e. should they compare the position or also internals like (potentially invalid) castling flags or irrelevant ep squares? Or even the entire history of moves? Should comparisons depend on legal move generation?

niklasf commented 7 years ago

Regarding (1) I decided to just make the zobrist hash easier to extend for now:

class MyZobristHasher(chess.polyglot.ZobristHasher):
    def hash_ep_square(self, board):
        if board.has_legal_en_passant():
            return super(MyZobristHasher, self).hash_ep_square(board)
        else:
            return 0

zobrist_hash = MyZobristHasher(chess.polyglot.POLYGLOT_RANDOM_ARRAY)

print(zobrist_hash(chess.Board())

niklasf commented 7 years ago

All done. (3) now compares board objects as chess positions (basically as if comparing FENs, disregarding the history, but inlcuding the move counters).