smackers / smack

SMACK Software Verifier and Verification Toolchain
http://smackers.github.io
Other
427 stars 82 forks source link

Implement context-sensitive memory model #755

Open shaobo-he opened 3 years ago

shaobo-he commented 3 years ago

Motivation

Small functions that modify/access pointers such as strlen, strcmp, and xmalloc cause DSA to merge nodes unnecessarily. For example,

#include <string.h>

char* x = "hello";
char* y = "world";

__attribute__((always_inline))
int mystrlen(char* p) {
  int z = 0;
  while(*p++)
    z += 1;
  return z;
}
int main(void) {
  return strlen(x) + strlen(y);
  //return mystrlen(x) + mystrlen(y);
}

SMACK only generates one region for both x and y whereas if we choose to use mystrlen, SMACK produces two regions. If we adopt a context-sensitive version of sea-dsa, we will get a region for each call site of strlen.

shaobo-he commented 3 years ago

Some background information about sea-dsa:

sea-dsa's context-sensitive analysis produces a graph for each function, where each pointer of the function is associated with a cell. Take one of sea-dsa's regression tests as an example,

extern int nd(void);

void f ( int *x , int *y ) {
  *x = 1 ; *y = 2 ;
}

void g (int *p , int *q , int *r , int *s) {
  f (p,q) ;
  f (r,s);
}

int main (int argc, char**argv){

  int x;
  int y;
  int w;
  int z;

  int *a = &x;
  int *b = &y;
  if (nd ()) a = b;

  g(a, b, &w, &z);
  return x + y + w + z;
}

Compiling it with O1 and invoking the context-sensitive version of sea-dsa produces the following three graphs.

image

image

image

So a rudimentary attempt to add the context-sensitive memory model is to encode the functions in store-passing style. Namely, each Boogie procedure takes input as the regions which its pointers are associated with and returns the updated regions.

For example, for f, g, and main, their signatures are,

procedure f(x: ref, y: ref, M.in.1: ref[int]) returns (M.out.1: ref[int]);
procedure g(p, q, r, s, M.in.1: ref[int], M.in.2:ref[int]) returns (M.out.1: ref[int], M.out.2: ref[int])
procedure main(M.in.1: ref[int], M.in.2:ref[int]) returns (M.out.1: ref[int], M.out.2: ref[int])

For each procedure, we assign the in version of a region into its out version because parameters are immutable in Boogie. So, for example, we do the following in g:

entry:
M.out.1 := M.in.1;
M.out.2 := M.in.2;

In this way, we always use the out version of a region when encoding load/store instructions.