ultimate-pa / ultimate

The Ultimate program analysis framework.
https://ultimate-pa.org/
198 stars 41 forks source link

better array initialization in C to Boogie translation #259

Closed alexandernutz closed 5 years ago

alexandernutz commented 7 years ago

Improve the Boogie code that is generated from an array initialization in the C program (by default or via initializer).

Example: int a[10][10][10] = {1, 2}; will be translated to 1000 boogie statements each one initializing one array cell. 998 of them initialize to 0.

Plan for initializing an array:

  1. Set all cells to the default value.
  2. Update all cells that are explicitly mentioned by the initializer to the corresponding value

I believe we should improve step 1. Several approaches come to mind:

(I did not find a corresponding issue, let me know if this is a duplicate)

alexandernutz commented 7 years ago

A related issue is that we do not support initialization of variable length arrays with our current method. This should be dealt with by all three proposed approaches.

alexandernutz commented 7 years ago

If no one intervenes I will start by introducing a special function for array initialization. The first implementation of that function will be approach 1 (with while loops).

danieldietsch commented 7 years ago

We discussed that switching between the three approaches should be easy. Therefore I am all for implementing the one that is easiest right now.

alexandernutz commented 7 years ago

Update after discussion with @Heizmann :

alexandernutz commented 6 years ago

I am currently looking into this ticket. I think I know a way but it requires some changes to memory model and solver infrastructure that I want to discuss it here.

Situation:

What needs to be done:

  1. Add support for the function const-Array This was discussed in #329, so conceptually I know which Boogie code to insert. However currently there is only a hack. (See SolverBuilder#ENABLE_Z3_CONSTANT_ARRAYS) I will need to do changes to the C translation (probably during declaration of memory model infrastructure) and in SolverBuilder. I have rough ideas how to do this, but comments and caveats are welcome.
  2. Our current declaration of #memory_int, namely var #memory_int : [$Pointer$]int; is not compatible with the assignment as written above. The effective type of #memory_int is now [int, int] int, but for the assignment to work, we need [int][int] int. Any good ideas how this could be tackled? Or am I missing something?
Heizmann commented 6 years ago

The effective type of #memory_int is now [int, int] int, but for the assignment to work, we need [int][int] int.

In my opinion, this is the main problem for the constant array optimization.

alexandernutz commented 6 years ago

I agree. Also, note that it gets even messier for #memory_Pointer, which is of type [Pointer] Pointer at the moment. I'm not even sure what the solution is for that. Two arrays, one for offset, one for base?

In any case, one difficulty are lines like requires value[ptr] == 0; This would have to be replaced by requires value[ptr!base][ptr!offset] == 0; --> perhaps an alternative would be to have the BoogiePreprocessor do that translation.

Heizmann commented 6 years ago

I think there is a misunderstanding. I do NOT propose to replace [int, int] int by [int][int] int. I just wanted to agree that the type [int, int] int prevents us from implementing a very simple solution.

Heizmann commented 6 years ago

I would propose the following.

  1. Find a way to refer to constant arrays in Boogie. This should be done by using/extending our :builtin directive
  2. Develop (if necessary) a similar directive that allows us to translate a single function to a complex SMT term. Use some of the following ideas. A. Hardcode the SMT term in Ultimate (bad) B. Wrap the complex Term in an SMT function C. Hardcode the SMT function in Ultimate (bad) D. Obtain the SMT function as translation result of a Boogie function E. Define SMT-LIB functions like store, select in Boogie, use a special directive to prevent their translation. F. Pass the SMT-LIB function definition as an attribute (bad)
  3. Develop (if necessary) a similar directive that is understood by the StructExpander and that makes sure that the attributes survive the preprocessor.
danieldietsch commented 6 years ago

I do not understand why there is a hack that already enables this functionality with z3 but no solution for the incompatible signatures. How does the hack work?

alexandernutz commented 6 years ago

Ok, here is my proposal after some offline discussions with @danieldietsch and @jhoenicke : I am not totally sure, which of @Heizmann 's letters it corresponds to, if any. Perhaps B?

Solution for Problem 1:

Solution for Problem 2:

Overall this seems to be the simplest solution to me. One benefit is that the current builtin directive should suffice and there have to be no Boogie infrastructure or the memory model of the translation.

danieldietsch commented 6 years ago

You can easily pass the prelude object via the toolchain storage. I would try and ensure that there is only one prelude object for a toolchain, so that if other plugins also want to add something, they add it to this object.

One example for a toolchain storage is CodeBlockFactory. It

You can just replicate this pattern in the SMTPrelude object. If you want to allow multiple SMTPrelude instances, I would instead build an SMTPreludeStorage object that can hold these, and make the SMTPreludeStorage an IStorable

alexandernutz commented 6 years ago

For the record: there is a certain ugly aspect of the current solution:

Say, we want to initialize an array of pointers. In Boogie, we will use this function:

function { :builtin "initToZero_Pointer" } init(a : [$Pointer$] $Pointer$, p : int) 
    returns (res : [$Pointer$] $Pointer$);

However, the BoogiePreprocessor will convert this function into the following two functions (because the return value is a struct):

function { :builtin "initToZero_Pointer" } init.base(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res : [int,int]int);
function { :builtin "initToZero_Pointer" } init.offset(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res : [int,int]int);

Now in our case this seems acceptable, because we can use the same smt function for both Boogie functions, but in general this does not seem very nice to me.

If anyone thinks we should use another solution, because of this or something else, let me know.

danieldietsch commented 6 years ago

Hm, I did not remember that. I thought that functions can also have multiple return parameters.

One fix could be to also allow pure procedures without implementation instead of the function. Also with the builtin attribute.

Then we would possibly get

procedure { :builtin "initToZero_Pointer" } init(a.base : [int,int]int, a.offset : [int,int]int, p : int) 
    returns (res.base : [int,int]int, res.offset : [int,int]int );
modifies()
alexandernutz commented 6 years ago

Had a discussion with @Heizmann :

Note that both attributes, builtin and smtdefined might need an additional solution for dealing with Boogie functions that have a struct return type and thus are split up by the StructExpander.

Questions? Comments? Also note that, Boogie and the transition from Boogie to SMT is not really my expertise..

danieldietsch commented 6 years ago

I also think having self-contained Boogie programs is important. But I think the problem does not change by much if we move the SMT prelude to the Boogie program.

How do we create the term for :smtdefined ?

How do we recover it from the Boogie code?

What is the solution for struct return types?

alexandernutz commented 6 years ago

Regarding self-containedness: The benefit I see is that we have a chance of recovering the semantics when the SMT code is in the Boogie program. In my previous proposal it would have been gone after dumping the program to a text file.

Regarding SMT type checking:

Regarding recovering from Boogie:

Regarding struct return types: The main problem seems to me that for a general solution we might want to have different SMT functions for each entry of the returned struct. So we would need an attribute that allows us to specify several SMT functions. Something like this:

function { :smtdefined "0" "p.offset" } resetbase(p : $Pointer$) 
    returns (res : $Pointer$);

(The example is just to get the idea across, I do not have a general solution. It might not be correct Boogie syntax either.) resetbase would be the function that takes a pointer and returns a pointer whose base hase been set to 0 while the offset is left as is. Our usecase looks even somewhat harder because the return type is [Pointer] Pointer. If anyone has a proposal or comments, please let me know.

alexandernutz commented 6 years ago

Regarding the struct return type problem, the following would be a more complete proposal:

Example:

function { :smtdefined "(+ p!base 1)" "p!offset" } incrementBase(p : Pointer) 
    returns (res : Pointer)

expands to

function { :smtdefined "(+ p!base 1)" } incrementBase.base(Pointer p) returns (res : int)
function { :smtdefined "p!offset" } incrementBase.offset(Pointer p) returns (res : int)

If there is not the right number of arguments in the attribute, the Structexpander should probably crash. Typechecking at creation could check the argument count, too.

Heizmann commented 6 years ago

No one commented on my "lets use procedures without implementation instead" proposal. Is it a bad idea? Why?

I think that would work too, but I cannot see the problem that this solves. In doubt, Functions seem to be more natural since functions may occur in expressions and in SMT-LIB functions also have only one return value.

Regarding the struct problem. I think the proposed solution seems reasonable. An alternative might be to wrap the two annotations explicitly in an annotation that is a directive for the struct expander. Maybe only by using a different keyword that does explicitly mention the struct expander.

But I also have to say that I cannot overlook the whole situation at the moment. How many functions do we need?

Do we need more functions?

alexandernutz commented 6 years ago

Functions that have use cases in the InitializationHandler:

alexandernutz commented 6 years ago

I just commited changes that have code that generates Boogie with smtdefined as proposed above. Some cases might not work, yet. All of the new functionality should still be deactivated and will not work until changes in Boogie and Boogie to SMT infrastructure have been made. Changes to the use of attributes should still be easy to incorporate, so the discussion is still open..

danieldietsch commented 6 years ago

Ok, after our talk a short reminder for myself.

New attribute :expand_struct "<struct-field-id>"

New attribute :smtdefined "<Term>"

danieldietsch commented 6 years ago

@alexandernutz the attributes should now work and transfer correctly. Have a look at examples\BoogiePL\structexpander\StructFunctionAttributesTest.bpl for examples.

alexandernutz commented 6 years ago

We have made some progress. Now, the C translation, the StructExpander, and Boogie2Smt should work correctly (according to a few manual tests).

Problem right now: Quantifier elimination for arrays crashes during verification. This might be a job for @Heizmann :-)..

The error message:

java.lang.AssertionError: var is still there: |v_#memory_int_15| term size 5 at de.uni_freiburg.informatik.ultimate.modelcheckerutils.smt.Elim1Store.elim1(Elim1Store.java:452)

The term which it is trying to eliminate from: (= |#memory_int| (~initToZeroAtPointerBaseAddress~int |v_#memory_int_15| |main_~#a~0.base|))

The function ~initToZeroAtPointerBaseAddress~int is defined like this:

(store in0 in1 ((as const (Array Int Int)) 0))

The problem can be reproduced as follows:

alexandernutz commented 6 years ago

Update: to avoid the quantifiers from trace interpolation, I tried out settings using only SMTInterpol. In this case the assertion that checks inductiveness of interpolants fails. I don not know why, yet.

danieldietsch commented 6 years ago

@alexandernutz Can you dump a script for the SMTInterpol case? Perhaps this is a job for @jhoenicke and/or @tanjaschindler

Heizmann commented 6 years ago

@alexandernutz Could you add a test to the QuantifierEliminationTest class?

Proceed like the more recent tests and use TermParseUtils.parseTerm(...) instead of constructing the formula manually.

danieldietsch commented 6 years ago

You mean TermParseUtils.parseTerm(...) right?

alexandernutz commented 6 years ago

@Heizmann I added two tests as suggested. They are named constantArrayTest01 and constantArrayTest02. Not 100% sure if they make sense like this but they should... -- feel free to let me know..

alexandernutz commented 6 years ago

@danieldietsch @jhoenicke @tanjaschindler here you go, the below file produces (error "generated interpolants did not pass sanity check") when I run SMTInterpol (current master branch on github) on it debug.zip

Heizmann commented 6 years ago

@alexandernutz in this class is a special new method for quantifier elimination that does not make sense to me. You were using that method. I modified that call. https://github.com/ultimate-pa/ultimate/commit/c854ea4e6ba0faf6819f8844583af41d0f757886 Now the first test passes and the second test fails because the quantified variable is the argument of a function application. My partial quantifier elimination does not support that case. (And I am not willing to change that unless the change helps)

I think, if we would replace ~initToZeroAtPointerBaseAddress~int by the function body, my partial quantifier elimination would eliminate the quantifier. Question to @alexandernutz and @danieldietsch: Could we inline this (use defined) SMT function immediately?

danieldietsch commented 6 years ago

Currently, we call defineFunction for a new script instance and define all smtdefined function symbols that way.

You can access all functions defined that way via CfgSmtToolkit and its SmtSymbols getSmtSymbols() method.

I do not know how feasible it would be to require an SmtSymbols instance for the quantifier elimination, but if this is no issue you could substitute all the function applications with their bodies. Although, I am not sure if we have some kind of SMT function inliner. The normal substitution will probably not work, or at least be somewhat inefficient.

If you think that this would be a good way forward, I could look into it.

Heizmann commented 6 years ago

If possible, we should inline the function immediately.

danieldietsch commented 6 years ago

We could do it during the construction of the Icfg. But the backtranslation will be difficult. Do you have an idea how we could do it there? Then you could explain it to me and I can implement it.

Heizmann commented 6 years ago

I see, that is a problem. But I have no idea for the backtranslation. I think we have to look at individual expressions that occur in practise and think about solutions.

danieldietsch commented 5 years ago

I updated SMTInterpol in 5d08986750682c75e39ad1f92180c9582722463d -- this should fix the error you reported @alexandernutz

danieldietsch commented 5 years ago

I think this feature is now implemented and all follow-ups should get their own ticket.