ultimate-pa / ultimate

The Ultimate program analysis framework.
https://ultimate-pa.org/
194 stars 40 forks source link

Small constant strings create huge SMT queries #404

Open danieldietsch opened 5 years ago

danieldietsch commented 5 years ago

One of our regression tests takes ~2,5h to complete.

Failed
de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite.AutomizerC.xml+AutomizerC-nestedInterpolants_const.epf: /storage/jenkins/jenkins_home/jobs/Ultimate Nightly/workspace/trunk/examples/programs/regression/c/TestIntegerDivision01.c

Failing for the past 2 builds (Since #1927 )
Took 2 hr 25 min.
Error Message
ExpectedResult: SAFE UltimateResult: EXCEPTION_OR_ERROR   [de.uni_freiburg.informatik.ultimate.plugins.generator.traceabstraction: SMTLIBException: Timeout exceeded: de.uni_freiburg.informatik.ultimate.smtinterpol.interpolate.Interpolator.interpolate(Interpolator.java:215)]

https://monteverdi.informatik.uni-freiburg.de/ci/job/Ultimate%20Nightly/1928/de.uni_freiburg.informatik.ultimate$de.uni_freiburg.informatik.ultimate.regressiontest/testReport/de.uni_freiburg.informatik.ultimate.regressiontest.generic/RegressionTestSuite/AutomizerC_xml_AutomizerC_nestedInterpolants_const_epf___storage_jenkins_jenkins_home_jobs_Ultimate_Nightly_workspace_trunk_examples_programs_regression_c_TestIntegerDivision01_c/

I will try and provide a script.

danieldietsch commented 5 years ago

It seems like this issue is caused by our encoding of small strings.

I build a scalable example program:

   int x = 0;
   printf("%d",x);
   //@ assert x == 0;

The resulting Boogie program looks like this

implementation ULTIMATE.init() returns (){
    #NULL := { base: 0, offset: 0 };
    #valid[0] := 0;
    call #t~string1 := #Ultimate.alloc(3);
    #memory_int[{ base: #t~string1!base, offset: #t~string1!offset }] := 37;
    #memory_int[{ base: #t~string1!base, offset: 1 + #t~string1!offset }] := 100;
    #memory_int[{ base: #t~string1!base, offset: 2 + #t~string1!offset }] := 0;
}

implementation ULTIMATE.start() returns (){
    var #t~ret2 : int;

    call ULTIMATE.init();
    call #t~ret2 := main();
}

implementation main() returns (#res : int){
    var #t~nondet0 : int;
    var ~x~0 : int;

    ~x~0 := 0;
    havoc #t~nondet0;
    assert 0 == ~x~0;
    #res := 0;
    return;
}

var #t~string1 : $Pointer$;
var #NULL : $Pointer$;
var #valid : [int]int;
var #length : [int]int;
var #memory_int : [$Pointer$]int;
type $Pointer$ = { base : int, offset : int };
...

And the single trace check for this example looks like this

...
(assert (! true :named ssa_precond))
(assert (! (not false) :named ssa_postcond))
(assert (! ... :named ssa_0_GlobVarAssigCall))
(assert (! true :named ssa_0_LocVarAssigCall))
(assert (! ... :named ssa_0_OldVarAssigCall))
(assert (! 
(and 
  (= |#t~string1.offset_1| 0) 
  (= |#NULL.offset_1| 0) 
  (= 0 |#NULL.base_1|) 
  (= (store |#length_0| |#t~string1.base_1| 3) |#length_1|) 
  (= (select (store |#valid_0| 0 0) |#t~string1.base_1|) 0) 
  (= 
    |#memory_int_1|
    (store 
      |#memory_int_0| 
      |#t~string1.base_1| 
      (store 
        (store 
          (store 
            (select |#memory_int_0| |#t~string1.base_1|) 
            |#t~string1.offset_1| 
            37) 
          (+ |#t~string1.offset_1| 1) 
          100) 
        (+ |#t~string1.offset_1| 2) 
        0))
  )
  (= (store (store |#valid_0| 0 0) |#t~string1.base_1| 1) |#valid_1|) 
  (not (= |#t~string1.base_1| 0))
) :named ssa_1))
...

So we can see that the main chunk of the script comes from nested store/select terms that occur during storage of the string in our memory array.

If I increase the number n of printf("%d",x); statements, the script size starts growing rather exponentially from n=7 (01 has 1 printf, 02 has 2, etc.).

7.4K StringPerformance01.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
9.9K StringPerformance02.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 14K StringPerformance03.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 18K StringPerformance04.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 26K StringPerformance05.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 39K StringPerformance06.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 65K StringPerformance07.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
113K StringPerformance08.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
207K StringPerformance09.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
394K StringPerformance10.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
766K StringPerformance11.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
1.5M StringPerformance12.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
3.0M StringPerformance13.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
5.9M StringPerformance14.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 12M StringPerformance15.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 24M StringPerformance16.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 47M StringPerformance17.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
 93M StringPerformance18.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
185M StringPerformance19.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2

      119       767      7538 dump/StringPerformance01.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      140       995     10044 dump/StringPerformance02.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      161      1301     13348 dump/StringPerformance03.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      182      1757     18188 dump/StringPerformance04.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      203      2507     26040 dump/StringPerformance05.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      224      3839     39913 dump/StringPerformance06.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      245      6329     65666 dump/StringPerformance07.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      266     11129    115117 dump/StringPerformance08.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      287     20543    211902 dump/StringPerformance09.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      308     39179    403293 dump/StringPerformance10.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      329     76253    783834 dump/StringPerformance11.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      350    150197   1542613 dump/StringPerformance12.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      371    297875   3057806 dump/StringPerformance13.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      392    593015   6085765 dump/StringPerformance14.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      413   1183073  12139194 dump/StringPerformance15.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      434   2362961  24243501 dump/StringPerformance16.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      455   4722503  48449502 dump/StringPerformance17.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      476   9441347  96858829 dump/StringPerformance18.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2
      497  18878789 193674746 dump/StringPerformance19.c_AllErrorsAtOnce_Iteration1_TraceCheck.smt2

The verification time for this single trace check is rather small. The TraceAbstraction plugin needs for the n=20 instance roughly 6 seconds.

The original program (TestIntegerDivision01.c) has n=32, 32 assert statements, and much more involved trace checks that possibly require case distinctions (reasoning with division and modulo). I would expect a single trace check to be around 740GB in size if printed. Obviously there is some sharing while the script is in memory. And it also seems to fit inside of monteverdi's memory (max. 32GB).

But the increase in complexity is still the same. In particular if you consider that the actual verification condition does not depend on the strings at all.

We should discuss possible solutions to this issue.