Benchmarking, Pencilmark Sudoku

Hi, I really enjoyed your blog series on SAT solvers. While Sudoku is an example that's approachable and easy to explain, you've seen that minisat is soundly defeated by specialized Sudoku solvers, and this somewhat weakens the force of your argument. Here are two suggestions that may strengthen it:

(1) Your minisat-based solver has high startup costs since it has to initialize all the exactly-one constraints. Since your benchmark runs each puzzle in a separate process it pays this cost repeatedly. Any context where you would care about solver speed would likely involve solving many puzzles, so it seems fair to exclude or amortize this cost. If you do this you will likely reduce your average solving time on the top95 puzzles to below 0.5 ms, which would put the SAT solver clearly ahead of the ones you compare to in post 1.5. You can also get some speedup from augmenting your CNF encoding. See a few different encodings here with benchmarks in the same project: https://github.com/t-dillon/tdoku/blob/master/src/solver_minisat.cc

(2) Conventional Sudoku are small problems, and in most cases the minisat-based solver makes so few decisions that it doesn't benefit much from CDCL. But if you consider "pencilmark" Sudoku, the story is totally different. Pencilmark Sudoku obey all the same rules as normal Sudoku, but the clues are given as negative literals instead of positive ones. i.e., the starting board has some set of candidate eliminations, but it may not have any cells actually filled in. This leaves open a much larger search space. While minisat handles this pretty well, some of fastest solvers for conventional Sudoku lack the sufficient heuristics and inference procedures to stay out of trouble in this larger space, and in some cases they can be pathologically slow. For example, I've seen tdoku take minutes to solve pencilmark puzzles that minisat handles in milliseconds. I'm not aware of common benchmarks for pencilmark puzzles at this point, but this sort of extreme behavior is easy to find even with randomly generated puzzles.

So, consider framing your comparison based on pencilmark rather than conventional Sudoku. It's easy to explain, it's an easy change to make in your solver, solvers like tdoku and fsss2 already support it, and it will show the SAT solver in a favorable light (it will probably still have a higher median solving time, but it will have a much lower mean).

Hi, glad you enjoyed the posts. However, I don't think I ever argued that SAT-based solvers are the end all, be all, but rather that they are easy to write, and provide good base performance. I was fairly explicit about this in my talk, and if the blog posts give across different impression, I might need to make some changes.

Anyway, decomposing the runtime into "startup" and "actually solving the problem" with assumptions could make for an interesting additional data point. The combined time in current results was intentional, as I wanted to avoid having to delve into whether a specific solver can also amortize some of its runtime across multiple inputs or not, but having the results separately sidesteps this issue.

As for pencilmark Sudoku, this is the first time I hear about it, but it also sounds like something interesting to explore when I have more free time... but honestly, I originally started writing the posts for content that I still haven't gotten to, so when I find some time to write again I want to focus on part 4 first.

horenmar / sudoku-example

Benchmarking, Pencilmark Sudoku #1