Closed aljungberg closed 3 years ago
Ah, good point! So I added the tests back in, unfortunately, they have not been maintained and use outdated options. You can see this in the first (comment) line of the .anf
files under tests/
. But the ideas of the tests are actually quite good, e.g. the one that tests Gauss-Jordan elimination is cool, and clearly shows good work being done by Davin Choo. It's on me that they have not been updated/maintained. I did, however, add a fuzzer. In case you'd be interested in fixing up the tests, that'd be really cool! I'd love to merge some PRs. In the meanwhile, I can keep this issue open to remind me that the tests are, well, not missing anymore, but certainly outdated and not working.
I hope I helped,
Mate
Thanks for adding the tests back! Tests fixed up in #29.
These tests are great, very easy to read and work with. I'd propose having a lot more of them given the complexity of this tool. For instance, every time you find a new bug using the fuzzer or whatnot it might be worth it to create a formal test for that particular edge case to avoid future regressions.
Agreed, the Gauss-Jordan elimination is I think in some ways the most compelling argument for working in ANF format to begin with (as you rightfully point out in your paper). Even GJE enabled SAT solvers like your CryptoMiniSat still have to deal with the fact that long XOR chains really explode in CNF.
These tests are great, very easy to read and work with. I'd propose having a lot more of them given the complexity of this tool. For instance, every time you find a new bug using the fuzzer or whatnot it might be worth it to create a formal test for that particular edge case to avoid future regressions.
--> Amen to that! If you check out my other work, CryptoMiniSat, you will see I have 100+ tests and multi-thousand lines of Python+fuzz-generator+post-analysis for fuzzing. And I did put a lot (but not enough, see SAT->UNSAT bug spotted https://github.com/meelgroup/bosphorus/issues/27 by you) of effort into fuzzing lately.
Ah, you read the paper! I feel honoured. I think the XL is kinda interesting as an idea, too. I wanted this tool to become like a melting pot of different techniques. I want to add Groebner Basis method(s) one day...
As promised, I added the required fuzzer improvement as you mentioned above to check for SAT->UNSAT. Within about 20 seconds it found the bug. One fuzz seed that works is 11228 (full run below).
Thanks again for this and thanks for pushing me to do add this improvement!
Mate
./fuzz.sh ./cryptominisat5 11228
[...]
+ echo 'Doing random seed 11228'
Doing random seed 11228
+ ./anf_gen.py 11228
+ ./bosphorus --simplify 0 --anfread out_fuzz/problem.anf --cnfwrite out_fuzz/problem_simple.cnf
+ ./cryptominisat5 --maxconfl 40000 --zero-exit-status out_fuzz/problem_simple.cnf --verb 0
real 0m0.015s
user 0m0.006s
sys 0m0.009s
+ grep -q '^s SATISFIABLE$' out_fuzz/simple_sol.out
+ echo 'Problem is SAT as per simple solving'
Problem is SAT as per simple solving
+ SATISFIABLE=1
+ ./bosphorus --anfread out_fuzz/problem.anf --cnfwrite out_fuzz/problem.cnf --solmap out_fuzz/map --solve
+ ./check_solution.py out_fuzz/bosph_out out_fuzz/problem.anf 1
known sat: 1
known sat: <class 'int'>
Solution: []
ERROR: Simple solver found a solution, but simplified version says UNSAT!!
There might be something not immediately apparent but I couldn't find any lit tests in the repository.
Trying to run the test suite as per the README,
What am I overlooking?