This produces puzzles that are trivially solvable -- and do not require backtracking

tlh24 commented 2 months ago

https://github.com/tlh24/sudoku-rl/blob/9c266e0c45295941cf56b23b9f402430f9ba9bda/sudoku_gen.py#L312

Refactored the code a bit to make it more legible.

Since it only removes digits that can be directly deduceable from the current board state means that, with the current graph transformer model, no backtracking is needed & the current model should (to be demonstrated!) be able to solve all boards presently. In other words, it only generates 'easy' sudoku boards that do not require any pattern matching or DP. This, of course, is a severe limitation of the original paper, the results of which were generated from these easy puzzles.

In comparison, the original random removal code sometimes generates puzzles that have more than one solution -- we should sieve these out! -- but at the same time, some of the puzzles very much require backtracking / pattern matching / DP to solve.

Check my logic though.

jsjung00 commented 2 months ago

Sorry, this code took me a really long time just to understand :|

But to come back to the question, this is the standing Claim: Under this puzzle generating scheme, for each empty cell in a given puzzle, there is only one option from that cell which is directly deducible from the other cells—the idea being that there is only one value for that empty cell that will satisfy the row, col, and block constraints.

I don't think this claim is necessarily true. The key point is that the code prevents deletion of a cell (i,j) under condition (*) where there exists a cell in the same row such that it can take on the val[i,j] and a cell in the same col where it can take on the val[i,j] and a cell in the same block that can take on the val[i,j]. Seen in the line if row and col and square:

However, it is possible to delete a cell (i,j) where in the same row for example there exists another cell (i', j') that can take on the val[i,j]. There is some handwaving here (no formal proof given) but with enough deletions by continously plucking, we can suppose that another cell (i', j') can take on the val[i,j]. (Intuition: if this wasn't true, then the if row and col and square: continue statement would never fire) We can WLOG also assume that cell (i',j') doesn't have the condition (*) above.

Then, if cell (i,j) is plucked and then cell (i',j') is plucked, our board now has two cells that are on the same row that are empty, where the cell (i',j') has at least two possible values: the initial value val[i',j'] and the val[i,j]. This shows that it is possible to have under the puzzle generating function a board that has a cell with multiple possible valid options.

tlh24 commented 2 months ago

OK, after reviewing the code more & manually solving some of the generated puzzles, I think my claim is only 1/3 right: as you deduce, the code does create boxes where there is more than one immediate option, by sequential 'plucking'.

However, I think by including the if row and col and square clause, I think the original claim that it precludes hard sudokus does hold -- e.g. sudokus where you must do second-order coloring problems. The reason for this is that if the agent only places clues where there is only one option, then it can solve the puzzle by (roughly) reversing the plucking operation.

At least, in the handful of generated puzzles that I manually solved, this turned out to be true -- and this with only 25 givens, for which the original generator seems to always require higher level strategies (and usually has more than one solution).

I guess we should demonstrate solution on these easier problems, and think about how to eliminate puzzles with more than one solution thereafter..

tlh24 / sudoku-rl

This produces puzzles that are trivially solvable -- and do not require backtracking #4