Add functionnalities to the Python interface

ghost commented 6 years ago

As said in #454 we've been working on the python interface for the past two months as part of an assignment. The goal was to add DRAT capabilities to the interface.

We're almost there and would like to know if this is something people would be interested in. Here is a summary of what we have done :

added a drat argument to the solve function to query the creation DRAT file. This argument is a string which represents the name of the file to be generated (in the same way the command line application works);
modified the setup_solver function to generate the entire python object and not only the SATSolver. Before that, this object only contained one field. Now it contains vectors that represents the list of [xor_]clauses. It looked more natural to generate the python object this way than generating each field in their own setup function;
added the nbclauses function. It only uses a counter that is incremented at each `add[xor_]clause` (as suggested in #454);
added a save function to save the state of the Solver object in a DIMACS file. This is needed if you want to verify your proof with DRAT-trim and have only generated the clauses through the interface;
added a load function to load any DIMACS file at any moment in the Solver construction process;

Of course some other functions have been added as a side effect in the process.

This ended up being a bit more than just adding the DRAT generation to the interface as we realized we needed some other features.

What still needs to be done :

add tests to the testsuite. ATM only the nb_clauses function is tested, which is... not much;
we forked at 979d967 so their may be some changes that need to be done, however nothing has been done on this side to the interface so it should be pretty smooth;
modify the README file to show usage of the new features;

Their's probably some work to do on the code as we're no experts in C++ let alone the implementation of Python bindings. We understand that this part of the project is not your primary focus and we'll have to work for our finals soon so it will probably take some time to clean up our repo and merge.

Improvements that could or will be done to our code :

make the saving of the clauses non-mandatory. That's 3 vectors we added to the object (one for the regular clauses, the other two for the xor_clauses and their respective rhs variables). They may take a lot of memory and we don't always want to rebuild the DIMACS file from the object;
we may have omitted some cases that are not compatible with the DRAT generation, if so, they need to be taken into consideration;
some functions are really big and don't seem very efficient. We need to look into that;

As you may have realized, this is not a fast going project and we still have a lot of work to do on this. You can find our work in my repo here : https://github.com/MPaulmier/cryptominisat

If this is interesting to you let us know and we'll do our best to finish our work (we roughly have 2-3 weeks until due date) and clean the repo before creating a PR (though it will take some time as we approach the end of our year). If not that is also fine to us and it will sit in my repo :)

msoos commented 6 years ago

Wow, this sounds really interesting actually. So, please make a pull request so I can see the full diff! Beware, bit of a rant below about memory usage, that needs fixed :D

The line at:

Line 278: self->clauses.push_back(lits);

Uses too much memory. It would use at the very least, with 100M binary clauses (completely normal thing for large problems):

(64b2 (start+end ptr for the vector)+232b(for 2 Lits)) * 100M

Note that the same is stored as:

32b2 100M

in CMS. The difference is 3x memory usage. That's ignoring the malloc overhead and the minimum chunk size that malloc will allocate, so maybe you can add 20% to that without being too off. That's the difference of it fitting into memory in 4GB or not fitting into 12GB. Some users will be hugely upset :(

You see, SAT solving is hard not only because it's difficult to solve some CNF problems. It's difficult because you have to cater to CNFs that contain 500 variables, and also to ones that contain 500M variables. If you think I'm making this up, just look at this active issue open right now: https://github.com/msoos/cryptominisat/issues/445 It's a pretty funny issue but it's very typical :) Your change would not give us hope of fixing that issue when the solver is used from the python interface.

Can you please make a pull request so I can see the change? I have a feeling there are other, minor, issues but I can't see them right now. I can't see the DIFF. Please make a pull request :) We can discuss there, then you can push to your repo, thereby changing the pull request and then we can slowly move towards a solution :)

I am not trying to get your hopes down, I think this is a good effort -- but you need to think outside of your specific use-case and cater to people who use this system in many-many weird and unusual ways to solve their problems :) Remember, for them, the unusual is the norm :D

msoos commented 6 years ago

Here is how to add a pull request:

https://help.github.com/articles/creating-a-pull-request/

msoos commented 6 years ago

I am closing as the pull request failed because I was not convinced this was a clean enough solution with enough benefits. The DRAT concept still needs some thought.

ghost commented 6 years ago

Hey, I didn't realise this was still open after you closed the PR. We were very busy wrapping up this project so that's why I didn't follow up after the closing of the PR. Anyway we are very thankful that you took the time to explain why you didn't accept it and totally understand your concerns regarding memory consumption, code inefficiency... A lot of the changes we've done could (and probably should, for the most part) be left to the devs using the interface. As for the DRAT generation, I'm sure you'll come up with a better implementation than ours and find ways to test it in the test suite (which we didn't do).

msoos commented 6 years ago

Hey! Yeah, the issue is DRAT really -- the SAT community should take DRAT more seriously and commit more time to it. So that's what's holding us back right now. I actually appreciated what you did as you tried to pushed the limits of what's possible with DRAT and in effect found failings in it. Note that e.g. DRAT does not support library-based usage. That is not good. That's the use-case in many scenarios. The python interface sort of stretched that -- in a good way. I suggest you push your thought/idea as an article out there, or pitch it to Marijn Heule. It is important for the SAT community for the librarly-based version to work and I fully support that work.

msoos / cryptominisat

Add functionnalities to the Python interface #471