opencog / link-grammar

The CMU Link Grammar natural language parser
GNU Lesser General Public License v2.1
386 stars 118 forks source link

SAT parser improvements #1452

Open linas opened 1 year ago

linas commented 1 year ago

The SAT parser is slower in all situations. Supporting it no longer seems tenable. grep -r USE_SAT_SOLVER * |wc indicates there are over a dozen places where ifdefs could be removed.

Is there any reason to keep it around?

ampli commented 1 year ago

grep -r USE_SAT_SOLVER * |wc indicates there are over a dozen places where ifdefs could be removed.

There are 27 places, some of which are due to different memory management in the SAT and classic parser that could be unified, or because there was no point to take care to allow it to be used with generation. In general, I think that are sparse and don't add much code complexity, and mostly serve as documentation of which code supports the SAT solver.

The SAT parser is slower in all situations.

There are several reasons for its slowness:

  1. It uses expressions (after they are pruned expression_prune()). It then has a limited and not-so-effective power prune by SAT constraints. I have a branch that does something that may seem unbelievable: it uses power_prune(), without changes, to make expression prune. It makes the SAT parsing significantly faster.
  2. In English we have postprocessing rules. They take a significant time, and the way they work is not compatible with the SAT parser. Fixing that will speed up the parsing very much.
  3. When building the SAT clauses, only a few caches are used, and repeated things are redone from scratch. Adding some more caches would significantly improve the speed.
  4. Most important: The current SAT parser works on connectors, not tracons. The classic parsing of long sentences got sped up by several OOMs due to tracons. I expect a similar speedup for the SAT parser.
  5. Memory sharing: The connector memory sharing makes the connector block relatively very small and helps it to fit the CPU cache. Its speedup for the long sentence batches is more than an OOM. I guess a very significant speedup would result for the SAT parser if it is done there too.
  6. I tested a better XOR encoding and this by itself got a 2x speedup. Because XOR handling is so significant, I wanted to replace the SAT solver library with a XOR SAT solver (and make the needed code changes) but have never found the time for that.
  7. Using memory pools would also contribute to the speed.

To sum up, in my opinion, the speed of the SAT parser may be improved to be faster than the classic parser. It can even be changed to work with nulls. In addition, constraints may be removed to allow it to make cross-links (possibly only in certain cases according to a definition file). However, because doing that is not in my LG code priorities just now (I have many other WIPs waiting to be converted to PRs), the things I mentioned may not support keeping this code.

ampli commented 1 year ago

Yet another thing on the SAT parser: It naturally can lead to parallel processing per sentence if used with a parallel (multi-thread) SAT library. On 40 cores 10x speedup may be possible, depending on the SAT clauses, but of course, I don't even have a guess regarding the LG SAT parser.


Here is the answer of ChatGPT when I asked it about such open-source libraries:

Several open-source SAT solvers support multi-threading for speedup on modern multi-core CPUs. Here are some of the most popular ones:

  1. CaDiCaL: CaDiCaL is a highly parallelized SAT solver that is designed to run on modern multi-core CPUs. It uses portfolio solving with multiple parallel solver instances, each with different parameter settings and heuristics. CaDiCaL also supports shared memory communication between threads for improved parallelization.

  2. Glucose: Glucose is a high-performance SAT solver that is based on MiniSat and uses several advanced techniques to improve performance, including clause deletion and restart strategies. Glucose supports multi-threading through a parallel version called Plingeling.

  3. Lingeling: Lingeling is a highly optimized SAT solver that is based on the "lingeling" algorithm. It is known for its speed and efficiency and supports multi-threading through a parallel version called Treengeling.

  4. MiniSat: MiniSat is a fast and widely used SAT solver that is written in C++. It supports multi-threading through a parallel version called MiniSat-Parallel.

  5. PicoSAT: PicoSAT is a lightweight SAT solver that is designed for use in embedded systems and other resource-constrained environments. It supports multi-threading through a parallel version called Parallel-PicoSAT.

These SAT solvers are all highly optimized for performance and can achieve significant speedup when running on multiple cores. However, the exact speedup will depend on the specific instance being solved, the hardware configuration being used, and the specific settings and parameters chosen for the solver.

linas commented 1 year ago

OK, Well then. I guess it can stay...

linas commented 1 year ago

I'm renaming this issue to fit the comments