Fixing the SAT solver - Githubissues

ampli commented 8 years ago

Hi Linas!

I mostly finished the full integration of the SAT solver. In addition to the previous fixes, that are listed in the comment in #187, I have fixed several more aspects of the SAT solver:

Initialized the domain array so it doesn't crash on !links.
It now works fine on Hebrew too.
It frees properly all memory.
I rearranged some of the code to be more similar to that of the standard resolver.
It now works fine on batches!
- It gives exactly the same number of errors on all the languages.
- From first glance, differences in diagrams of errors are due to different linkages order.
- Initial batch runtime comparisons to the regular solver:
- Slightly slower on en/4.0/batch.
- Definitely faster on en/4.0/fixes.batch.
- About twice faster on the Russian batch.

6. The maximum number of shown linkages is now !limit instead of the arbitrary 222.

Remained to fix:

Different costs are shown in !disjuncts.
The DIS= cost in the status message is always 0.
It still checks only alternative[0] in in guiding.cpp. I need to check if fixing this will help finding linkages faster for the affected sentences.
More code can be made similar, and more code can be shared (i.e. the SAT solver code can call more functions from the regular LG lib). However, this will need adding more functions to link-grammar.def (I already added many). Alternatively, and maybe better, the shared functions can be moved to the corresponding .h files, and be removed from link-grammar.def. This will solve the problem of the many non-API symbols that are exported by the LG lib.
It may be interesting to replace the current MiniSAT code (an old release) by the latest release (version 2.2). Github: niklasso/minisat.
Please add more issues if I forgot something...

linas commented 8 years ago

The link-grammar.def file is not needed by Linux; its used only by Windows and MacOS, where it plays a vital role. Basically, it lists the functions that are publically visibile from a shared library.

I guess you are suggesting that we declare these functions as public API functions in various header files, but I really really don't want to do that, since they real are not means to be used by the public. A better solution would be to make the sat library be a STATIC libraries (not shared) and then link it directly into the LG library, if SAT compilation is enabled. That way, I believe we could remove most of the crud from the link-grammar.def file.

ampli commented 8 years ago

On Linux I just had to add these functions to linkgrammar.def - there was no need to define them as public PAI.

if so, isn't this linkgrammar.def addition enough also for Windows and MacOS?

In any case, I suggested above that we solve the need to add these in linkgrammar.def in another way: Put these functions as inline functions in ".h" files (which are included anyway in the places that now use these functions). Is this a better idea than a static SAT library?

Alternatively (instead of a static library), why note to just link the sat-solver files along with the rest of the link-grammar library files?

linas commented 8 years ago

Well, yes, since we've configured the lib to use link-grammar.def, then yes, its used by Linux too.

If the functions are small then yes, maybe inline code in a header file might be OK, but is the cure worse than the disease?

Yes, we could link the SAT code straight into the LG library, but that would require writing some strange, convoluted, hard-to-debug autoconf code .. which, because its weird and hard-to-debug, I would not want to do. So, suggesting that it be a static lib gives you the same effect, but with less work and less confusion...

ampli commented 8 years ago

Here is a summary of problems that are still open after the latest fixes:

Additional costs are shown in !disjuncts of the standard parser. I traced this to be due to costs of null expressions, as in [()]. As far as I can see, the sat-solver code just doesn't encode these expressions, so their associated costs cannot be retrieved when a solution is found. I have an idea how to fix that, but I also have general questions on that. I will open a new issue.
Different order of disjuncts in !disjuncts. The connectors are not necessarily displayed in a monotonically increased order according to the words to which they belong, so these disjuncts cannot be used to recreate the linkage (not that anybody really wants to do that). But I guess this is not too important. Fixing it would cause an overhead in producing the linkages, with a negligible benefit (such things can be fixed without overhead by a "lazy" processing, but there is currently no code infrastructure for that).
The sat-solver doesn't pre-produce all the linkages and sort them. This can be fixed (and the code can be shared with that of the standard parser). There is no much benefit to introduce such a fix just now due to the missing cost handling of null expressions (point 1 here). If/when it is fixed, I will open an issue on what involved in such a fix (of pre-producing all the linkages).
If a sentence has no complete linkage, nothing is currently shown. Parsing with null words can be introduced. The trivial fix is to parse again with one word less and so on, trying all combinations, until a solution is found. A more interesting fix could be to encode the sentence with "assumptions", so selected words can be "neutralized" on demand if no initial solution is found, and the solution try repeated. This means solutions with "null words" can then be found with a negligible overhead.
No panic timeout.

From all of these problems, I guess that fixing the cost problem is at the highest priority.

ampli commented 8 years ago

An additional thing related to point (4) above. When using link-parser with !use_sat_solver=1, if there is no complete linkage, currently it takes an additional time to get the linkparser> prompt after the message No complete linkages found. This is particularly noticeable with long sentences.

The reason is that by default !null=1, so if no complete linkage is found, link-parser tries to find a linkage with min_null_count=1 and max_null_count=sentence_length(). Then sat_parse() tries again the same parsing (disregarding these setups), a thing that doubles the time to the the prompt again.

My suggested fix, until an encoding that allows parsing with nulls is implemented (I try to find out how to do that in an incremental way): 1. If min_null_count>0, issue a message and immediately return with no linkages. Something like:

if (opts->min_null_count > 0) {
    // The sat solver doesn't support (yet) parsing with null words
    if (opts->verbosity >= 1)
      prt_error("Info: Cannot parse with null links yet (set !null=0 to turn off this message)\n");
    sent->num_valid_linkages = 0;
    sent->num_linkages_post_processed = 0;
    return 0;
  }

2. If min_null_count==0 but max_parse_count>0 (never happens with link-parser), try to parse anyway but if no complete linkage is found, also issue the said message.

I will send a pull request that implements that, but if you think it should be done in another way I can of course change it.

ampli commented 8 years ago

Benchmarking newer Minisat (and Minisat-based) code. Description and conclusion.

I tried LG with the following libraries:

The Minisat version that is used now.
Minisat 2.2 (+ additions), from GitHub niklasso/minisat.
Glucose 4.0, from the Glucose SAT Solver Web site.
CryptoMiniSat, from GitHub msoos/cryptominisat.

Here is a summary. It turns out that my previous finding that Minisat 2.2+ performs the same as the current Minisat version doesn't hold for en/4.0.fixes-long.batch.

The tests were mainly with the default SAT-libraries parameters (I didn't test much with non-default ones, and the changes I tested didn't cause a speed-up).

On first glance it seemed that the comparison to the standard LG parser is not completely fair since the standard parser finds and sorts up to 1000 linkages per sentence even when running in batch mode, when the SAT parser only finds up to one valid linkage, since it currently finds linkages on demand only. However, from further tests it turned out that the overhead of fetching linkages by the standard parser is relatively negligible (for the SAT parser it is the main overhead).

In the tests described below, the SAT parser only finds the first valid linkage (if there is a one), while the standard parser produces up to 1000 valid ones (in both cases, the linkages themselves are not printed unless there is an error). Since producing linkages in the SAT solver is time-consuming, and "non-sane" linkages can be numerous for long sentences and must be skipped, adding "same alternative" constraints while encoding may be beneficial (not tested).

The standard parser outperforms the SAT one only for 4.0.batch. For the other runs, the SAT parser (all libraries but the current one - see 3. below) outperforms the standard one by far (even >50 times for en/4.0.fixes-long).
For en/4.0.batch and en/4.0.fixes.batch, the results of all the versions but CryptoMiniSat were very close (after a slight tuning that I mentioned in #187). Glucose-4.0 was the best one. CryptoMiniSat was about 300% worse for these batches. Maybe some SAT parameters tuning may help, but I don't know how to do that.
For en/4.0.fixes-long there were huge differences between the current and newer versions. This batch file contains very long sentences, whose SAT encoding produces >>1E6 clauses. The original version (0) encountered what seems as infinite-loop (or maybe a severe trashing) on one of the sentences. Another sentence "misbehaved" and took 95% of the CPU time of the batch. Even without this 2 sentences, the timing of the currently used Minisat version was the worst. Here CryptoMiniSat was the best one.
Even though the link-parser parameters include -timeout=100000 and -panic (and no parsing with nulls got tried, since I used batch runs), the number of errors when running 4.0.fix-long.batch was not the same. This is strange. I don't know which one is more "correct", since the SAT encoding apparently has a problem with the disjunct costs.

Conclusion: The currently used SAT library (0) has a severe problem with 2 example sentences (on one of them it "gets stuck"). Libraries (1) and (2) outperform it, and significantly so on long sentences. CryptoMiniSat outperforms all of them on long sentences, but has a very poor performance on shorter ones.

My recommendation is to replace the current Minisat version by Glucose-4.0 (3). BTW, CryptoMiniSat could be better if its relatively very bad results on short sentences could be fixed.

Also, I would first like to check the impact of improvements in the low-level part of generating the CNF (I don't understand yet the higher-level part of the LG linkage encoding).

linas commented 8 years ago

would you still happen to have the raw numbers? Could you post them?

One reason the SAT work stagnated is because the original author found that he could not beat LG for normal-sized sentences. One could argue this was well-known -- SAT was first deployed in the chip-design industry, and it could not beat established tools for "ordinary"-sized IC's. When IC's got larger, SAT started winning by orders of magnitude: but you have to have big problems to seethe pay-off. So, also, it seams, in LG....

I've often thought of a "hybrid" mode: using the usual parser for sentences under 30-50 words, and then switching to SAT for anything longer.

Is the Glucose license compatible with LGPL?

The biggest reason to NOT use sat is that we need a parser that can handle conversations, where words dribble in one-at-a-time, or where several people are talking. For that viterbi would produce useful results, although it would be a lot SLOWER than the current LG.

If you are fishing for something to do that would be useful & cool, there are several things worth doing. One is partial-sentence support: implement a "virtual-left-right-wall" that captures the state (the unfulfilled disjuncts) in mid-sentence; then later, when the rest of the words come in, parsing resumes at this "virtual wall".

Another would be to handle quotations correctly, and/or multiple speakers: if you have two people saying things, can you untangle them into distinct streams? it would need probabilistic hints ...

ampli commented 8 years ago

would you still happen to have the raw numbers? Could you post them?

My relevant diary is attached here: fix-long-benchmark.txt

I've often thought of a "hybrid" mode: using the usual parser for sentences under 30-50 words, and then switching to SAT for anything longer.

I now try to optimize the low level clause generation by using PBlib. However, for really short sentences it is most probably not going to help since there is not much to optimize then in the low level.

Is the Glucose license compatible with LGPL?

It has a type of "free" license. I couldn't find the license online so I attache it here, taken from Glucose-4.0 extracted sources (renamed from LICENSE TO LICENSE.txt for the purpose of this attachment) : LICENSE.txt

One is partial-sentence support: implement a "virtual-left-right-wall" that captures the state (the unfulfilled disjuncts) in mid-sentence; then later, when the rest of the words come in, parsing resumes at this "virtual wall".

For that I will need some hints and help in how exactly approach this task, because I have already looked at it and couldn't conclude how it should be done.

Another would be to handle quotations correctly

I also started to investigate this problem. There is an article in my reading list that will hopefully give me useful insights: Punctuation in Quoted Speech. This article also discussed quoted text (p.63) and other interesting issues: ANALYSIS OF A FINITE-STATE GRAMMAR FOR PARSING AVIATION-SAFETY REPORTS.

if you have two people saying things, can you untangle them into distinct streams? it would need probabilistic hints ...

It seems to me that in order to do that in a good manner, the program needs to "understand" much more than what the current LG library can do.

[BTW, meanwhile I didn't forget the tokenizing problem and even the Hebrew dictionary.]

linas commented 8 years ago

Could you create a directory called "diary" and place fix-long-benchmark.txt into that?

The LICENSE seems to be a BSD-type licence, so its compatible.

linas commented 8 years ago

Also, I noticed that the russian tests have a very different behavior than english, with respect to timing.

linas commented 8 years ago

For the mid-sentence "wall" think of it this way: if you cut the final parse of a sentence mid-way, you will cut some links, i.e. generate some connectors. On the first half of the sentence, the connectors all point right; the last half, they all point left. Thus, a cut sentence looks as if it could be "just some word" with connectors on it, i.e. a disjunct.

During parsing, there are more than one possibilities: so when you cut, there will be multiple possible disjuncts attached to the half-sentence ...

ampli commented 8 years ago

Could you create a directory called "diary" and place fix-long-benchmark.txt into that?

I will do that. I would also want to add the "ru" results.

For Russian there is a problem of mismatched alternatives ("insane linkages". Finding such "bogus" solutions is a significant overhead.

However, it is possible to SAT-encode the constraint of "select only words from the same alternative". This will remove the "insane" solutions. Adding constraints, by itself, may even make the SAT solving more efficient.

linas commented 8 years ago

Well, careful about "constraints": SAT is a constraint-solver; in general, its best if you tell SAT what the constraint is, and let it solve it. Or at least, that is the principle. The practice, that's another matter, I guess.

ampli commented 8 years ago

Regarding adding constraints to eliminate "insane" linkages, I forgot to add:

It will also save the time of "sane_morphism()".
It will speed up the English parsing too, since the tokenizer now creates alternatives also for English, and in long sentences there may be several alternatives, a thing that causes many "insane" combinations and thus many insane linkages.

Also, I noticed that the russian tests have a very different behavior than english, with respect to timing.

Here are my results: ru-banchmark.txt

linas commented 8 years ago

Hmm. OK. Yes, switch got glucose, I guess. Sure, add constraints.

For sentences between 20-50 words, sometimes SAT is faster, sometimes the orig algo. One nutty idea is to start both, one in each thread, and first one to stop wins. Of course if the timeout is small, the orig algo will always win... the answer seems to be "use SAT if the sentence has a lot of words and the timeout is large" ....

ampli commented 8 years ago

the answer seems to be "use SAT if the sentence has a lot of words and the timeout is large" ....

This is indeed something that needs exploring.

Another thing I would like to explore is SAT parallelism. I already tried to set CryptoMiniSat for parallelism (but only with the default parameters) and the results where extremely bad - the elapse time was worse by much (I guess this depends on the problem to be solved, and there are problems in which SAT parallelism is more successful).

I still need to try the Glucose-4.0 parallelism.

ampli commented 8 years ago

ampli commented on Sep 19, 2015

Different costs are shown in !disjuncts.

The main problem is expressions like that:

[()]

The current code assigns costs only to connectors, so I have no idea what to do with costs of null expressions.

linas commented 8 years ago

Ah, good question. The costly-null is almost always used in the form A+ or [()] which says "you can make the A link, or not. The A link costs nothing, but failing to make the A link costs 1.0". There are many places where this is used. The only one I can remember right now is the verb-wall link, which says "you had better link (WV) the head-verb to the wall, and if you don't it will cost you 2.0" which makes sense: if you have a sentence with a verb, you want to indicate it, as its more-or-less the most important part of the sentence.

linas commented 8 years ago

Thus, as a corollary, expressions such as A+ or [()] means that either you have to have the concept of a "null connector", or that costs must be a property of the disjunct, not the connector. Viz B+ & (A+ or [()]) must be expanded out by distributive property, as (B+ & A+) or [B+] so that the null connector serves only to transfer the cost over to the grand-total disjunct.

ampli commented 8 years ago

I guess that if I would like to fix it, I will need to add a concept of a "null connector", because changing the SAT encoding code to have the cost on disjuncts seems to me too ambitious (and maybe unneeded).

The question is what the matching rules of a "null connector" are. Are they just "It always automagically matched, but with nothing"?

linas commented 8 years ago

That seems like a dangerous idea. A simpler approach would be to always expand out any terms that have a null link in them, and bump the cost of the expanded form. Surely, the SAT solver already has some mechanism to expand null links, so you don't have to re-invent that. All you have to do is to propagate that cost onto some other .. any other .. connector in the expanded form. And since the SAT solver already has to deal with expressions like [C+ & (D+ or E+)] there must be some way of shifting the cost over to somewhere else. (in the above, you can put the cost onto C+ or onto D+ AND E+ .. funny thing, how the or's turned into ands and v.v. in this last sentence!).

ampli commented 8 years ago

If I have an expression B+ & (A+ or [()]), where should the cost be shifted to?

linas commented 8 years ago

already answered that. B+ & (A+ or [()]) is identical to (B+ & A+) or [B+]

ampli commented 4 years ago

Update: By now, some of the remaining issues in the SAT parser are:

Speed. The classic parser has become very much faster and it will be hard to compete with it.
Parsing with nulls. Seems possible to implement with a reasonable overhead, but it is not easy.
Linkage order (produce the best linkages first). Also not too easy to fix.
Implement timeout (this is easy to fix if desired).

opencog / link-grammar

Fixing the SAT solver #188

Benchmarking newer Minisat (and Minisat-based) code. Description and conclusion.