Strange behaviour of control.add_clause in Propagator.check method

Glinzerow commented 2 years ago

I experienced some strange behaviour when adding a clause in the check method of the Propagator class. After the check-method rejected a total assignment by adding a clause, no new total assignment is calculated. The program does not terminate and no further process is achieved. In pycharm debugging mode the complete stack of python calls was empty after adding the conflict.

import sys
from typing import Iterable, Tuple, List, cast

from clingo import ast, SolveResult, Control, Application, PropagateInit
from clingo import PropagateControl, Backend, Propagator, clingo_main
from clingo.ast import ProgramBuilder, parse_files

class CheckPropagator(Propagator):
    def __init__(self):
        self.count = 0

    def check(self, control: PropagateControl):

        assignment = control.assignment
        self.count += 1
        print(f'Checker called: {self.count}')

        conflict = []
        for level in range(1, assignment.decision_level+1):
            conflict.append(-assignment.decision(level))

        control.add_clause(conflict)

class GACApp(Application):

    def main(self, ctl: Control, files: Iterable[str]):

        if not files:
            files = ["-"]

        check: List[ast.AST] = []
        with ProgramBuilder(ctl) as builder:
            parse_files(files, builder.add)

        ctl.register_propagator(CheckPropagator())

        ctl.ground([("base", [])])
        ctl.solve()

sys.exit(clingo_main(GACApp(), sys.argv[1:]))

The encoding simply rejects any total assignment, so that finally it should return unsat for every input. For the following input, it simply stopped after it checked the first 14 total assignments.

#const n=12.
cell((1..n, 1..n)).
start((2,2)). end((n-1,n-1)).
next((X,Y),(XX,YY)) :- cell((X,Y)), cell((XX,YY)), |X-XX|+|Y-YY|=1.
visit(X) :- start(X).
visit(X) :- go(Y,X).
{go(X,Y) : next(X,Y)} 1 :- visit(X), not end(X).
:- end(X), not visit(X).
#show visit/1.

The encoding does also not work for #const n=13, 14, but it works fine for others (2,3,4,5,..11, 16,...) (Encoded is a path-finding example. On a n by n grid, a path from (2,2) to (n-1,n-1) should be found.) I did not encounter this problem on other clingo encodings. I was working with clingo 5.5.0 and python 3.9.

rkaminsk commented 2 years ago

It is strange that the solver stops to find answer sets after 14 checks but this might not even be a bug because the search does not stop. Maybe it is just that the solver does not find a solution anymore. A formulation that avoids the loop through the choice rule does not show this behavior:

#const n=12.
cell((1..n, 1..n)).
start((2,2)). end((n-1,n-1)).
next((X,Y),(XX,YY)) :- cell((X,Y)), cell((XX,YY)), |X-XX|+|Y-YY|=1.
visit(X) :- start(X).
visit(Y) :- visit(X), go(X,Y).
{go(X,Y) : next(X,Y)} 1 :- next(X,_), not end(X).
:- go(X,Y), not visit(X).
:- end(X), not visit(X).

#show visit/1.

You can add more threads using option -t8 when solving your original encoding and the solver continues to find solutions.

Furthermore, I also managed to make the solver not find any solutions by using your encoding with option --trans-ext=all. This seems to confuse the heuristic. By adding more threads the solver finds solutions again.

I am not sure if this is really a bug. There are a lot of weird assignments a solver can get stuck on on a 12x12 grid. One could actually visualize here what the solver is doing. It would be a bit of work though.

Glinzerow commented 2 years ago

Thank you for your fast answer! Changing the clingo encoding is definitely a workaround. So far I did not notice the behaviour on other encodings. I do not exactly understand what it means, the solver does not find a solution anymore. The added conflict just invalidates the previously found solution. I thought, this mirrors clingos behaviour, when clingo is expected to find more than 1 solutions (it's the conflict added in the guess-and-check python encoding from your paper: How to build your own ASP- system). If I run it with 2 threads, the first thread (seems to) "stop" solving, and only the second goes on. I have been running the one-threaded version on the zuse-cluster for 2 hours, and no more solutions have been created. This seems really strange to me, since the problem has "only" about 1000 atoms? And there are a lot of solutions. Can there be given a time limit (maybe just experience), such that it can be guaranteed that the solver finds a solution?

rkaminsk commented 2 years ago

Thank you for your fast answer! Changing the clingo encoding is definitely a workaround. So far I did not notice the behaviour on other encodings. I do not exactly understand what it means, the solver does not find a solution anymore.

It does not stop. It continues solving. Just pass option --stats and you will see that there are many conflicts and choices.

The added conflict just invalidates the previously found solution. I thought, this mirrors clingos behaviour, when clingo is expected to find more than 1 solutions (it's the conflict added in the guess-and-check python encoding from your paper: How to build your own ASP- system).

Not quite. By default, the solver does not record conflict clauses but applies a form of chronological backtracking. What you do is more like clasp's enumeration based on solution recording, which can be enabled with --enum=bt. But even then, there can be small differences to what you are doing and what clasp is doing. One difference is for example that the clause you are learning can be deleted again by clasp's data base cleanup heuristic. There might be further small differences.

If I run it with 2 threads, the first thread (seems to) "stop" solving, and only the second goes on. I have been running the one-threaded version on the zuse-cluster for 2 hours, and no more solutions have been created. This seems really strange to me, since the problem has "only" about 1000 atoms? And there are a lot of solutions. Can there be given a time limit (maybe just experience), such that it can be guaranteed that the solver finds a solution?

Since we are talking about NP problems here, there can be no such universal time limit. For your benchmarks, you should still set one. We often used 600s and sometimes even 3600s in the past.

For your problem, I would use a better encoding. From what I know, for paths it works best to give conditions on both the incoming and the outgoing edges:

#const n=12.
cell((1..n, 1..n)).
delta(1,0;-1,0;0,1;0,-1).
edge((X,Y),(X+DX,Y+DY)) :- cell((X,Y)), cell((X+DX,Y+DY)), delta(DX,DY).
start((2,2)).
end((n-1,n-1)).

{path(X,Y) : edge(X,Y)} 1 :- edge(X,_), not end(X).
{path(X,Y) : edge(X,Y)} 1 :- edge(_,Y), not start(Y).

:- path(X,Y), start(Y).
:- path(X,Y), end(X).

visit(X) :- start(X).
visit(Y) :- visit(X), path(X,Y).

:- path(X,Y), not visit(X;Y).
:- end(X), not visit(X).

#show visit/1.
#show path/2.

Glinzerow commented 2 years ago

Thank you for answers. I understand that the clingo encoding must be created really carefully to avoid falling in such a solving trap. I was not aware that this could happen on such a "simple" example.

potassco / clingo

Strange behaviour of control.add_clause in Propagator.check method #364