potassco / clingcon

⛓️ Extension of clingo to handle constraints over integers
https://potassco.org/
MIT License
25 stars 4 forks source link

Potential bug in multi-shot solving with initial bound and assumptions #81

Closed tlyphed closed 3 years ago

tlyphed commented 3 years ago

The following python script

import clingo
import clingcon
from clingo.symbol import Function, Number

# test data plus rules

program = '''
bin(1..3).
capacity(1, 10).
capacity(2, 7).
capacity(3, 5).

item(1..5).
size(1,3).
size(2,5).
size(3,4).
size(4,3).
size(5,2).

{ packed(I,B) : bin(B) } = 1 :- item(I).

&sum{ load(I,B) } = S :- packed(I,B), size(I,S).
&sum{ load(I,B) } = 0 :- item(I), bin(B), not packed(I,B).

&sum{ load(I,B) : item(I) } <= C :- capacity(B,C). 

usedBins(U) :- U = #count{ B : packed(_,B) }.
#minimize{ U : usedBins(U) }.

'''

# initialize clingo and clingcon
ctl = clingo.control.Control([])
thy = clingcon.ClingconTheory()
thy.register(ctl)

# load program
with clingo.ast.ProgramBuilder(ctl) as pb:
    clingo.ast.parse_string(program, callback=lambda ast : thy.rewrite_ast(ast, pb.add))

# ground base
ctl.ground([("base", [])])

cost = None

# prints model, records cost
def on_model(model):
    global cost
    print(model)
    cost = model.cost[0]

# solve for the first time without assumptions (works)
result = ctl.solve(on_model=on_model)

# set bound
ctl.configuration.solve.opt_mode = 'opt, ' + str(cost - 1)

# solve with assumptions (still works)
ctl.solve(on_model=on_model, assumptions=[(Function('packed', [Number(4), Number(1)], True), True), (Function('packed', [Number(1), Number(1)], True), True)])

# set new bound
ctl.configuration.solve.opt_mode = 'opt, ' + str(cost - 1)

# solve again with different assumptions (fails with "Invalid literal")
ctl.solve(on_model=on_model, assumptions=[(Function('packed', [Number(2), Number(2)], True), True), (Function('packed', [Number(3), Number(1)], True), True), (Function('packed', [Number(4), Number(1)], True), True), (Function('packed', [Number(1), Number(1)], True), True)])

causes clingcon 5.0.1 to fail with:

terminate called after throwing an instance of '(anonymous namespace)::ClingoError'
  what():  Invalid literal

This seems to happen only if there is at least one linear constraint and particular assumptions are used. Furthermore, clingcon 5.0.0 produces the same error.

rkaminsk commented 3 years ago

This is definitely a bug. I'll have to investigate where this is coming from. This might take a while.

tlyphed commented 3 years ago

Thanks! The problem also seems to appear when one uses a clingcon minimization directive and adds the bounds with sum constraints.

rkaminsk commented 3 years ago

It looks like some internal assertion in clingcon is violated. I cannot tell exactly what the problem is yet. This is just for reference.

Starting program: /usr/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bin(1) bin(2) bin(3) item(1) item(2) item(3) item(4) item(5) packed(1,1) packed(3,1) packed(4,1) packed(2,2) packed(5,3) size(1,3) size(2,5) size(3,4) size(4,3) size(5,2) capacity(1,10) capacity(2,7) capacity(3,5) usedBins(3)
bin(1) bin(2) bin(3) item(1) item(2) item(3) item(4) item(5) packed(1,1) packed(3,1) packed(4,1) packed(2,2) packed(5,2) size(1,3) size(2,5) size(3,4) size(4,3) size(5,2) capacity(1,10) capacity(2,7) capacity(3,5) usedBins(2)
python: ../../libclingcon/src/constraints.cc:296: bool Clingcon::{anonymous}::SumConstraintStateImpl<tagged, T>::propagate(Clingcon::Solver&, Clingcon::AbstractClauseCreator&, bool) [with bool tagged = false; T = Clingcon::{anonymous}::SumConstraintState]: Assertion `ass.is_true(lit_r) || ass.decision_level() == 0' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7dc5859 in __GI_abort () at abort.c:79
#2  0x00007ffff7dc5729 in __assert_fail_base (fmt=0x7ffff7f5b588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff6355fa0 "ass.is_true(lit_r) || ass.decision_level() == 0", file=0x7ffff63553c8 "../../libclingcon/src/constraints.cc", line=296, function=<optimized out>) at assert.c:92
#3  0x00007ffff7dd6f36 in __GI___assert_fail (assertion=0x7ffff6355fa0 "ass.is_true(lit_r) || ass.decision_level() == 0", file=0x7ffff63553c8 "../../libclingcon/src/constraints.cc", line=296, function=0x7ffff63560d0 "bool Clingcon::{anonymous}::SumConstraintStateImpl<tagged, T>::propagate(Clingcon::Solver&, Clingcon::AbstractClauseCreator&, bool) [with bool tagged = false; T = Clingcon::{anonymous}::SumConstraintS"...)
    at assert.c:101
#4  0x00007ffff62d830d in Clingcon::(anonymous namespace)::SumConstraintStateImpl<false, Clingcon::(anonymous namespace)::SumConstraintState>::propagate (this=0xab05c0, solver=..., cc=..., check_state=false) at ../../libclingcon/src/constraints.cc:296
#5  0x00007ffff6329d02 in Clingcon::Solver::check (this=0xaad640, cc=..., check_state=false) at ../../libclingcon/src/solver.cc:908
#6  0x00007ffff63184e7 in Clingcon::Propagator::check (this=0xa9d950, control=...) at ../../libclingcon/src/propagator.cc:521
#7  0x00007ffff62b7b49 in (anonymous namespace)::check (c_ctl=0x7fffffffbed0, data=0xa9d950) at ../../libclingcon/src/clingcon.cc:81
#8  0x00007ffff6d6f94e in (anonymous namespace)::ClingoPropagator::check (this=0xa1a970, solver=...) at /home/kaminski/Documents/git/potassco/clingo/libclingo/src/control.cc:1917
#9  0x00007ffff70cf868 in Clasp::ClingoPropagator::propagateFixpoint (this=0x981760, s=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/clingo.cpp:282
#10 0x00007ffff715d4d2 in Clasp::Solver::postPropagate (this=0xa8b150, stop=0x0) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solver.cpp:892
#11 0x00007ffff715cbc1 in Clasp::Solver::propagate (this=0xa8b150) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solver.cpp:803
#12 0x00007ffff715ab12 in Clasp::Solver::pushRoot (this=0xa8b150, x=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solver.cpp:486
#13 0x00007ffff715a9a4 in Clasp::Solver::pushRoot (this=0xa8b150, path=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solver.cpp:473
#14 0x00007ffff70ebea7 in Clasp::EnumerationConstraint::start (this=0xab0af0, s=..., path=..., disjoint=false) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/enumerator.cpp:97
#15 0x00007ffff70ecc49 in Clasp::Enumerator::start (this=0xabec30, s=..., path=..., disjointPath=false) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/enumerator.cpp:220
#16 0x00007ffff7156e76 in Clasp::SequentialSolve::doSolve (this=0xabf240, ctx=..., gp=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solve_algorithms.cpp:449
#17 0x00007ffff7155ec6 in Clasp::SolveAlgorithm::Scoped::solve (this=0x7fffffffc150, assume=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solve_algorithms.cpp:326
#18 0x00007ffff7155f49 in Clasp::SolveAlgorithm::solve (this=0xabf240, ctx=..., assume=..., onModel=0xa9a5e0) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/solve_algorithms.cpp:330
#19 0x00007ffff708aa1c in Clasp::ClaspFacade::SolveStrategy::startAlgo (this=0xabf0a0, m=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/clasp_facade.cpp:326
#20 0x00007ffff7091b3f in Clasp::ClaspFacade::SolveStrategy::doStart (this=0xabf0a0) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/clasp_facade.cpp:272
#21 0x00007ffff708a8a2 in Clasp::ClaspFacade::SolveStrategy::start (this=0xabf0a0, h=0x0, a=...) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/clasp_facade.cpp:315
#22 0x00007ffff708e8aa in Clasp::ClaspFacade::solve (this=0xa9a5e0, p=..., a=..., eh=0x0) at /home/kaminski/Documents/git/potassco/clingo/clasp/src/clasp_facade.cpp:1019
#23 0x00007ffff6d42f0c in Gringo::ClingoSolveFuture::ClingoSolveFuture (this=0xab0970, ctl=..., mode=...) at /home/kaminski/Documents/git/potassco/clingo/libclingo/src/clingocontrol.cc:793
#24 0x00007ffff6d4b91a in Gringo::gringo_make_unique<Gringo::ClingoSolveFuture, Gringo::ClingoControl&, Clasp::SolveMode_t> () at /home/kaminski/Documents/git/potassco/clingo/libgringo/gringo/utility.hh:404
#25 0x00007ffff6d40096 in Gringo::ClingoControl::solve (this=0xa9a110, ass=..., mode=0, cb=std::unique_ptr<struct Gringo::SolveEventHandler> = {...}) at /home/kaminski/Documents/git/potassco/clingo/libclingo/src/clingocontrol.cc:376
#26 0x00007ffff6d6f396 in clingo_control_solve (control=0xa9a110, mode=0, assumptions=0xaa4d70, assumptions_size=2, notify=0x7ffff7584c4c <pyclingo_solve_event_callback>, data=0x7ffff63d1f00, handle=0xab7e70) at /home/kaminski/Documents/git/potassco/clingo/libclingo/src/control.cc:1857
rkaminsk commented 3 years ago

@BenKaufmann, I have a question regarding this backtrace. The previous assertion is triggered because adding a unit resulting clause neither makes the unit literal true nor does Propagator::add_clause return false. It seems like this is happening on the root level. Is this something that might happen here, which I have to consider?

gef➤  frame 4
gef➤  p reason
$1 = std::vector of length 4, capacity 8 = {0xffffffbe, 0x26, 0xffffffac, 0xffffff8c}
gef➤  p/d reason.size()
$2 = 4
gef➤  p/d ass.is_false(reason[0])
$3 = 1
gef➤  p/d ass.is_false(reason[1])
$4 = 1
gef➤  p/d ass.is_false(reason[2])
$5 = 1
gef➤  p/d ass.is_true(reason[3])
$6 = 0
gef➤  p/d ass.is_false(reason[3])
$7 = 0
gef➤  p/d ass.decision_level()
$8 = 2
gef➤  p/d ass.root_level()
$9 = 2
BenKaufmann commented 3 years ago

@rkaminsk This sounds like a bug to me because Propagator::add_clause() should either assign the variable on the root level (can't undo further) or return false and then the assignment should happen once control is back in Clasp::ClingoPropagator::propagateFixpoint(). I tried to reproduce your example in a test but there everything works as expected. The code also looks fine to me. Do you have more details?

rkaminsk commented 3 years ago

@rkaminsk This sounds like a bug to me because Propagator::add_clause() should either assign the variable on the root level (can't undo further) or return false and then the assignment should happen once control is back in Clasp::ClingoPropagator::propagateFixpoint(). I tried to reproduce your example in a test but there everything works as expected. The code also looks fine to me. Do you have more details?

There seems to go more wrong later on too. If I disable the assertion, then it crashes because an invalid literal is accessed. We had something similar once, when the propagator was called but was not yet given the chance to reinitialize.

Luckily, this example is relatively small and there is no multi-threading. I should be able to provide you with something that just uses the clingo library + some C++. I won't be able to do anything this weekend and (maybe also not the next weekend). I'll come back to you a little later. Thanks, for confirming that there is a problem.

rkaminsk commented 3 years ago

Hi @BenKaufmann, I prepared a branch that hopefully makes it easy to debug the problem. The issue appears when using assumptions together with an unsatisfiable optimization problem. The issue does not appear if no assumptions are used. Here are the steps to get started:

git clone --branch fix/reproduce git@github.com:potassco/clingcon.git
cd clingcon
cmake -DCMAKE_BUILD_TYPE=debug -H. -Bbuild
make -j8 -Cbuild
gdb --args ./build/bin/test_clingcon assumptions
(gdb) run
Starting program: test_clingcon assumptions
Filters: assumptions
test_clingcon: libclingcon/src/constraints.cc:296:
  bool Clingcon::{anonymous}::SumConstraintStateImpl<false, T>::propagate(
    Clingcon::Solver&, Clingcon::AbstractClauseCreator&, bool)
    [T = Clingcon::{anonymous}::SumConstraintState]:
      Assertion `ass.is_true(lit_r) || ass.decision_level() == 0' failed.

Program received signal SIGABRT, Aborted.
BenKaufmann commented 3 years ago

@rkaminsk I only had a brief look so far. The problem is that the clause is added while the root path is being established but before the step literal has been assigned (which is currently typically the last literal of the path). Since the clause to be added is a temporary clause (it contains a solver local variable), it is extended with the step literal thereby making it a non-unit clause at this point in time. Only once the step literal has been assigned, the clause becomes unit and lit_r is assigned (which is of course after the assertion failed).

I'm not yet sure how to best fix this issue but hopefully I can make some progress next week.

BenKaufmann commented 3 years ago

@rkaminsk The assertion failure because of the (yet) unassigned step literal is only a minor issue that actually covers a larger problem. Once the step literal is assigned first, one runs into accesses to already removed (aux) variables from propagate()/check() in libclingcon\src\constraints.cc.

This is due to a regression introduced with https://github.com/potassco/clasp/issues/59 (https://github.com/potassco/clasp/commit/c5391b53d1b035d764af496574ef45632e4492e7). The final unsat core is computed only after aux variables of the current step have been removed but (apparently) before clingo constraints had a chance to do corresponding cleanup. Hence, when original assumptions are propagated again in SolveAlgorithm::detach(), clingo constraints still operate under the assumption that any added literals are still there resulting in crashes when this is not the case. To fix this, clasp probably either has to do the final conflict computation earlier (before removing aux variables) or defer removing aux variables until the next step is started.

Finally, I also ran into a bug in libclingo\src\astv2_unpool.cc:83. More precisely, the comparison jt != je in the following loop is broken as a call to emplace_back() always invalidates the past-the-end iterator je.

for (auto &pool : pools) {
  for (auto jt = product.begin(), je = product.end(); jt != je; ++jt) {
    assert(!pool->empty());
    auto kt = pool->begin();
    auto ke = pool->end();
    for (++kt; kt != ke; ++kt) {
      product.emplace_back(*jt);
      product.back().emplace_back(*kt);
    }
    jt->emplace_back(*pool->begin());
}

A trivial fix would be to replace iterators jtand je with corresponding indices.

rkaminsk commented 3 years ago

@rkaminsk The assertion failure because of the (yet) unassigned step literal is only a minor issue that actually covers a larger problem. Once the step literal is assigned first, one runs into accesses to already removed (aux) variables from propagate()/check() in libclingcon\src\constraints.cc.

This is due to a regression introduced with potassco/clasp#59 (potassco/clasp@c5391b5). The final unsat core is computed only after aux variables of the current step have been removed but (apparently) before clingo constraints had a chance to do corresponding cleanup. Hence, when original assumptions are propagated again in SolveAlgorithm::detach(), clingo constraints still operate under the assumption that any added literals are still there resulting in crashes when this is not the case. To fix this, clasp probably either has to do the final conflict computation earlier (before removing aux variables) or defer removing aux variables until the next step is started.

Thanks for looking into this.

Delaying the removal until the next step is started sounds reasonable. (I like doing things as lazily as possible.) I cannot judge which version is easier to implement though.

Finally, I also ran into a bug in libclingo\src\astv2_unpool.cc:83. More precisely, the comparison jt != je in the following loop is broken as a call to emplace_back() always invalidates the past-the-end iterator je.

for (auto &pool : pools) {
  for (auto jt = product.begin(), je = product.end(); jt != je; ++jt) {
    assert(!pool->empty());
    auto kt = pool->begin();
    auto ke = pool->end();
    for (++kt; kt != ke; ++kt) {
      product.emplace_back(*jt);
      product.back().emplace_back(*kt);
    }
    jt->emplace_back(*pool->begin());
}

A trivial fix would be to replace iterators jtand je with corresponding indices.

I did not know this requirement. It seems quite unnecessary (and prevents some useful code)! I cannot imagine that any release code will ever break because of this. But what can I do? I'll adjust the code...

BenKaufmann commented 3 years ago

@rkaminsk I prepared a fix in the following branch: https://github.com/potassco/clasp/tree/clingcon-81 Could you please check whether this works for you?

rkaminsk commented 3 years ago

Great! Thanks @BenKaufmann! This fixes the bug and all unit tests still run. @tlyphed: I already deployed clingo development packages that fix the problem.