better array initialization in C to Boogie translation

alexandernutz commented 7 years ago

Improve the Boogie code that is generated from an array initialization in the C program (by default or via initializer).

Example: int a[10][10][10] = {1, 2}; will be translated to 1000 boogie statements each one initializing one array cell. 998 of them initialize to 0.

Plan for initializing an array:

Set all cells to the default value.
Update all cells that are explicitly mentioned by the initializer to the corresponding value

I believe we should improve step 1. Several approaches come to mind:

use nested while loops in Boogie
use a special initialization function in Boogie that has an adequate specification (will need universal quantification)
use SMT default arrays (means that the RCFGBuilder is involved, too)

(I did not find a corresponding issue, let me know if this is a duplicate)

alexandernutz commented 7 years ago

A related issue is that we do not support initialization of variable length arrays with our current method. This should be dealt with by all three proposed approaches.

alexandernutz commented 7 years ago

If no one intervenes I will start by introducing a special function for array initialization. The first implementation of that function will be approach 1 (with while loops).

danieldietsch commented 7 years ago

We discussed that switching between the three approaches should be easy. Therefore I am all for implementing the one that is easiest right now.

alexandernutz commented 7 years ago

Update after discussion with @Heizmann :

below a certain threshold ratio of explicitly/implicitly initialized array cells we want to stick with the "old" implementation, which generates a sequence of assignment statements (because the additional while loops may be detrimental to verification)
- to keep in mind: there are other C features that may use the SMT default arrays, too. Examples: memset, arraycopy ...

alexandernutz commented 6 years ago

I am currently looking into this ticket. I think I know a way but it requires some changes to memory model and solver infrastructure that I want to discuss it here.

Situation:

It seems that our main use case are on-heap arrays, so I concentrate on them, here.
Example: Assume the following global C declaration: (It is global, so everything has to be initialized to 0.) int a[3]; we will generate Boogie code along these lines for this:
```
#a := malloc(3 * sizeofInt);
writeInt(0, {#a!base, 0 * sizeofInt});
writeInt(0, {#a!base, 1 * sizeofInt});
writeInt(0, {#a!base, 2 * sizeofInt});
```
I would like to generate the following Boogie code instead:
```
#a := malloc(3 * sizeofInt);
#memory_int := #memory_int[#a!base := const-Array(0)]
```
(i.e. inside the two-dimensional array #memory_int, I would like to assign a constant array to position #a!base)

What needs to be done:

Add support for the function const-Array This was discussed in #329, so conceptually I know which Boogie code to insert. However currently there is only a hack. (See SolverBuilder#ENABLE_Z3_CONSTANT_ARRAYS) I will need to do changes to the C translation (probably during declaration of memory model infrastructure) and in SolverBuilder. I have rough ideas how to do this, but comments and caveats are welcome.
Our current declaration of #memory_int, namely var #memory_int : [$Pointer$]int; is not compatible with the assignment as written above. The effective type of #memory_int is now [int, int] int, but for the assignment to work, we need [int][int] int. Any good ideas how this could be tackled? Or am I missing something?

Heizmann commented 6 years ago

The effective type of #memory_int is now [int, int] int, but for the assignment to work, we need [int][int] int.

In my opinion, this is the main problem for the constant array optimization.

alexandernutz commented 6 years ago

I agree. Also, note that it gets even messier for #memory_Pointer, which is of type [Pointer] Pointer at the moment. I'm not even sure what the solution is for that. Two arrays, one for offset, one for base?

In any case, one difficulty are lines like requires value[ptr] == 0; This would have to be replaced by requires value[ptr!base][ptr!offset] == 0; --> perhaps an alternative would be to have the BoogiePreprocessor do that translation.

Heizmann commented 6 years ago

I think there is a misunderstanding. I do NOT propose to replace [int, int] int by [int][int] int. I just wanted to agree that the type [int, int] int prevents us from implementing a very simple solution.

Heizmann commented 6 years ago

I would propose the following.

Find a way to refer to constant arrays in Boogie. This should be done by using/extending our :builtin directive
Develop (if necessary) a similar directive that allows us to translate a single function to a complex SMT term. Use some of the following ideas. A. Hardcode the SMT term in Ultimate (bad) B. Wrap the complex Term in an SMT function C. Hardcode the SMT function in Ultimate (bad) D. Obtain the SMT function as translation result of a Boogie function E. Define SMT-LIB functions like store, select in Boogie, use a special directive to prevent their translation. F. Pass the SMT-LIB function definition as an attribute (bad)
Develop (if necessary) a similar directive that is understood by the StructExpander and that makes sure that the attributes survive the preprocessor.

danieldietsch commented 6 years ago

I do not understand why there is a hack that already enables this functionality with z3 but no solution for the incompatible signatures. How does the hack work?

alexandernutz commented 6 years ago

Ok, here is my proposal after some offline discussions with @danieldietsch and @jhoenicke : I am not totally sure, which of @Heizmann 's letters it corresponds to, if any. Perhaps B?

Solution for Problem 1:

I introduce a class called SMTPrelude.
SolverBuilder.buildAndInitializeSolver takes an SMTPrelude object as an optional argument.
SMTPrelude contains SMT commands of the type declare-fun and define-fun
SolverBuilder.buildAndInitializeSolver enters the commands from SMTPrelude in the newly created script (as the hack did before)
I do not know yet, how easy or hard it is to pass an object from the CTranslation to the SolverBuilder. Hints are appreciated.

Solution for Problem 2:

move the problem of multidimensional vs nested arrays to the SMT-level, where there are only nested arrays
for every memory array, for example mem_int : [Pointer] int do the following,
- add a Boogie function to the translated program function { :builtin initializeToZero_mem_int} (array : [Pointer] int, baseAddress : int) returns (res : [Pointer] : int);
- add the following command to SMTPrelude
```
(define-fun initializeToZero_mem_int 
((array (Array Int (Array Int Int))) (baseAddress Int)) (Array Int (Array Int Int)) 
(store array baseAddress ((as const (Array Int Int)) 0)))
```
- I am not sure how this transition from multidimensional Boogie array to nested SMT array and back works but am hoping it works both ways. If you see a problem here, let me know.

Overall this seems to be the simplest solution to me. One benefit is that the current builtin directive should suffice and there have to be no Boogie infrastructure or the memory model of the translation.

danieldietsch commented 6 years ago

You can easily pass the prelude object via the toolchain storage. I would try and ensure that there is only one prelude object for a toolchain, so that if other plugins also want to add something, they add it to this object.

One example for a toolchain storage is CodeBlockFactory. It

implements the IStorable interface
has the methods getFactory(IToolchainStorage) and storeFactory(IToolchainStorage)

You can just replicate this pattern in the SMTPrelude object. If you want to allow multiple SMTPrelude instances, I would instead build an SMTPreludeStorage object that can hold these, and make the SMTPreludeStorage an IStorable

alexandernutz commented 6 years ago

For the record: there is a certain ugly aspect of the current solution:

Say, we want to initialize an array of pointers. In Boogie, we will use this function:

function { :builtin "initToZero_Pointer" } init(a : [$Pointer$] $Pointer$, p : int) 
    returns (res : [$Pointer$] $Pointer$);

However, the BoogiePreprocessor will convert this function into the following two functions (because the return value is a struct):

function { :builtin "initToZero_Pointer" } init.base(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res : [int,int]int);
function { :builtin "initToZero_Pointer" } init.offset(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res : [int,int]int);

Now in our case this seems acceptable, because we can use the same smt function for both Boogie functions, but in general this does not seem very nice to me.

If anyone thinks we should use another solution, because of this or something else, let me know.

danieldietsch commented 6 years ago

Hm, I did not remember that. I thought that functions can also have multiple return parameters.

One fix could be to also allow pure procedures without implementation instead of the function. Also with the builtin attribute.

Then we would possibly get

procedure { :builtin "initToZero_Pointer" } init(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res.base : [int,int]int, res.offset : [int,int]int );
modifies()

alexandernutz commented 6 years ago

Had a discussion with @Heizmann :

We should keep the Boogie program self-contained. Thus relying on an SMTPrelude object that is stored separately from the Boogie program is problematic.
We can allow SMT text in Boogie attributes. We will need a new attribute (at least one). The above example could be handled like this:
```
function { :smtdefined "(store array baseAddress ((as const (Array Int Int)) 0))" }
  initializeToZero_mem_int(array : [Pointer] int, baseAddress : int) 
       returns (res : [Pointer] : int);
```
We would have a new attribute called "smtdefined", whose argument is an SMT definition that should be put where the Boogie function is used. Probably a good convention is that the argument names in the Boogie function and the SMT code must match. The SMT code could either be used to add a ``(define-fun ..)" command or perhaps it could be inlined directly when translating Boogie to SMT.

Note that both attributes, builtin and smtdefined might need an additional solution for dealing with Boogie functions that have a struct return type and thus are split up by the StructExpander.

Questions? Comments? Also note that, Boogie and the transition from Boogie to SMT is not really my expertise..

danieldietsch commented 6 years ago

I also think having self-contained Boogie programs is important. But I think the problem does not change by much if we move the SMT prelude to the Boogie program.

How do we create the term for :smtdefined ?

Just a string, hardcoded somewhere
We create a term and use toStringDirect() (do we create a new script only for this or do we move the creation of the auxiliary solver to an earlier point in the toolchain?)

How do we recover it from the Boogie code?

With the TermParseUtils we currently use in our unit tests
Optimization: During the translation we create an annotation for the function and if this annotation is present we use the term contained in it.

What is the solution for struct return types?

No one commented on my "lets use procedures without implementation instead" proposal. Is it a bad idea? Why?
The only other idea I have is extending Boogie s.t. functions can have multiple return arguments (semantically a tuple)

I can help with the Boogie and Boogie to SMT part.

alexandernutz commented 6 years ago

Regarding self-containedness: The benefit I see is that we have a chance of recovering the semantics when the SMT code is in the Boogie program. In my previous proposal it would have been gone after dumping the program to a text file.

Regarding SMT type checking:

During translation to SMT is late but somewhat natural, and probably requires the fewest changes.
Whether there is type checking during creation of the Boogie program is of course left to the creator. We are doing partial type checks in the C to Boogie translation. We could also have a script there for SMT type checking (however, checking that the SMT term is compatible with the Boogie function's signature would require additional code). Imo this is not urgent and could be done later.

Regarding recovering from Boogie:

I agree: In general we will need an SMT parser here. TermParseUtils sounds like a good idea. Note that some matching of arguments probably needs to happen. I suppose the arguments are parsed as constants but might need to be variables, maybe some renaming has to happen.

Regarding struct return types: The main problem seems to me that for a general solution we might want to have different SMT functions for each entry of the returned struct. So we would need an attribute that allows us to specify several SMT functions. Something like this:

function { :smtdefined "0" "p.offset" } resetbase(p : $Pointer$) 
    returns (res : $Pointer$);

(The example is just to get the idea across, I do not have a general solution. It might not be correct Boogie syntax either.) resetbase would be the function that takes a pointer and returns a pointer whose base hase been set to 0 while the offset is left as is. Our usecase looks even somewhat harder because the return type is [Pointer] Pointer. If anyone has a proposal or comments, please let me know.

alexandernutz commented 6 years ago

Regarding the struct return type problem, the following would be a more complete proposal:

the smtdefined and builtin directives take as many Strings as arguments as required by the return type (NamedAttribute takes an array of Strings)
when the StructExpander splits up a function into several, the builtin arguments are distributed over them

Example:

function { :smtdefined "(+ p!base 1)" "p!offset" } incrementBase(p : Pointer) 
    returns (res : Pointer)

expands to

function { :smtdefined "(+ p!base 1)" } incrementBase.base(Pointer p) returns (res : int)
function { :smtdefined "p!offset" } incrementBase.offset(Pointer p) returns (res : int)

If there is not the right number of arguments in the attribute, the Structexpander should probably crash. Typechecking at creation could check the argument count, too.

Heizmann commented 6 years ago

No one commented on my "lets use procedures without implementation instead" proposal. Is it a bad idea? Why?

I think that would work too, but I cannot see the problem that this solves. In doubt, Functions seem to be more natural since functions may occur in expressions and in SMT-LIB functions also have only one return value.

Regarding the struct problem. I think the proposed solution seems reasonable. An alternative might be to wrap the two annotations explicitly in an annotation that is a directive for the struct expander. Maybe only by using a different keyword that does explicitly mention the struct expander.

But I also have to say that I cannot overlook the whole situation at the moment. How many functions do we need?

One that initializes a one-dimensional array with a given value (at the moment only for future use cases)
One that initializes a two-dimensional array with a given value at a certain position

Do we need more functions?

alexandernutz commented 6 years ago

Functions that have use cases in the InitializationHandler:

initialize a heap cell at a base pointer:

probably the most relevant case by far
we need one function for each memory array (assuming all memory arrays have a different type)

example for #memory_int:

function { :smtdefined "(store in0 in1 ((as const (Array Int Int)) 0))" } 
~initToZeroAtPointerBaseAddress~int(in0 : [{ base : int, offset : int }]int, in1 : int) 
    returns (out : [{ base : int, offset : int }]int);

initialize an off heap array
- The Boogie function represents a constant array of the right dimension filled with 0s. This constant array is then assigned to the to-be-initialized array variable.
- (Note that we translate nested C arrays to nested Boogie arrays, not multidimensional Boogie arrays.)
- we need one function per to-be-initialized array type
- Example for an array of type int x int -> int
```
function { :smtdefined "((as const (Array Int (Array Int Int))) ((as const (Array Int Int)) 0))" } 
~const-array-int~int~int() returns (out : [int][int]int);
```

alexandernutz commented 6 years ago

I just commited changes that have code that generates Boogie with smtdefined as proposed above. Some cases might not work, yet. All of the new functionality should still be deactivated and will not work until changes in Boogie and Boogie to SMT infrastructure have been made. Changes to the use of attributes should still be easy to incorporate, so the discussion is still open..

danieldietsch commented 6 years ago

Ok, after our talk a short reminder for myself.

New attribute :expand_struct "<struct-field-id>"

Will be consumed (removed) by the struct expander.
Only allowed for function applications where the return type is a struct
All attributes following the :expand_struct "<struct-field-id>" attribute until the next :expand_structattribute or no more attributes are attached to the function that is created for the struct field specified by <struct-field-id>

New attribute :smtdefined "<Term>"

Only allowed for function applications
function { :smtdefined "<Term>" } <FunId> (...) returns(...)
Will be handled somewhere in the RcfgBuilder
Name of return parameter does not matter.
Signature (types and names of parameters, type of return value) must match
Creates an SMT function with name <FunId> that takes parameters with matching sorts as the Boogie function, is of sort matching the boogie type, and has <Term> as definition.
Example
A function function { :smtdefined "(ite (< x y) y x)" } max(x:int, y:int) returns(out:int) creates a new SMT function symbol (define-fun max ((x Int) (y Int)) Int (ite (< x y) y x))
Note that together with expand_struct, the final name of the SMT function is defined by the initial name and the field names of the return type, as the struct expander will create new functions with name <FunId>(.<FieldId>)+

danieldietsch commented 6 years ago

@alexandernutz the attributes should now work and transfer correctly. Have a look at examples\BoogiePL\structexpander\StructFunctionAttributesTest.bpl for examples.

alexandernutz commented 6 years ago

We have made some progress. Now, the C translation, the StructExpander, and Boogie2Smt should work correctly (according to a few manual tests).

Problem right now: Quantifier elimination for arrays crashes during verification. This might be a job for @Heizmann :-)..

The error message:

java.lang.AssertionError: var is still there: |v_#memory_int_15| term size 5 at de.uni_freiburg.informatik.ultimate.modelcheckerutils.smt.Elim1Store.elim1(Elim1Store.java:452)

The function ~initToZeroAtPointerBaseAddress~int is defined like this:

(store in0 in1 ((as const (Array Int Int)) 0))

The problem can be reproduced as follows:

set hook "use constant arrays" in CTranslation settings (at the very bottom), other than that standard (non-bitvector) settings should work
toolchain: AutomizerC.xml
program: examples/CToBoogieTranslation/regression/variableInitialization_array_onedim_fixedlength_01.c

alexandernutz commented 6 years ago

Update: to avoid the quantifiers from trace interpolation, I tried out settings using only SMTInterpol. In this case the assertion that checks inductiveness of interpolants fails. I don not know why, yet.

danieldietsch commented 6 years ago

@alexandernutz Can you dump a script for the SMTInterpol case? Perhaps this is a job for @jhoenicke and/or @tanjaschindler

Heizmann commented 6 years ago

@alexandernutz Could you add a test to the QuantifierEliminationTest class?

Proceed like the more recent tests and use TermParseUtils.parseTerm(...) instead of constructing the formula manually.

danieldietsch commented 6 years ago

You mean TermParseUtils.parseTerm(...) right?

alexandernutz commented 6 years ago

@Heizmann I added two tests as suggested. They are named constantArrayTest01 and constantArrayTest02. Not 100% sure if they make sense like this but they should... -- feel free to let me know..

alexandernutz commented 6 years ago

@danieldietsch @jhoenicke @tanjaschindler here you go, the below file produces (error "generated interpolants did not pass sanity check") when I run SMTInterpol (current master branch on github) on it debug.zip

Heizmann commented 6 years ago

@alexandernutz in this class is a special new method for quantifier elimination that does not make sense to me. You were using that method. I modified that call. https://github.com/ultimate-pa/ultimate/commit/c854ea4e6ba0faf6819f8844583af41d0f757886 Now the first test passes and the second test fails because the quantified variable is the argument of a function application. My partial quantifier elimination does not support that case. (And I am not willing to change that unless the change helps)

I think, if we would replace ~initToZeroAtPointerBaseAddress~int by the function body, my partial quantifier elimination would eliminate the quantifier. Question to @alexandernutz and @danieldietsch: Could we inline this (use defined) SMT function immediately?

danieldietsch commented 6 years ago

Currently, we call defineFunction for a new script instance and define all smtdefined function symbols that way.

You can access all functions defined that way via CfgSmtToolkit and its SmtSymbols getSmtSymbols() method.

I do not know how feasible it would be to require an SmtSymbols instance for the quantifier elimination, but if this is no issue you could substitute all the function applications with their bodies. Although, I am not sure if we have some kind of SMT function inliner. The normal substitution will probably not work, or at least be somewhat inefficient.

If you think that this would be a good way forward, I could look into it.

Heizmann commented 6 years ago

If possible, we should inline the function immediately.

danieldietsch commented 6 years ago

We could do it during the construction of the Icfg. But the backtranslation will be difficult. Do you have an idea how we could do it there? Then you could explain it to me and I can implement it.

Heizmann commented 6 years ago

I see, that is a problem. But I have no idea for the backtranslation. I think we have to look at individual expressions that occur in practise and think about solutions.

danieldietsch commented 5 years ago

I updated SMTInterpol in 5d08986750682c75e39ad1f92180c9582722463d -- this should fix the error you reported @alexandernutz

danieldietsch commented 5 years ago

I think this feature is now implemented and all follow-ups should get their own ticket.

ultimate-pa / ultimate

better array initialization in C to Boogie translation #259