smackers / smack

SMACK Software Verifier and Verification Toolchain
http://smackers.github.io
Other
427 stars 82 forks source link

How to keep consistency between verified code and generated code #397

Open bitcalc opened 5 years ago

bitcalc commented 5 years ago

For the following snippet of code:

int main() {
  int *a = malloc(10 * sizeof(int));
  int c = a[100000];
  free(a);
  return 0;
}

When compiled with clang -O0 file.c, it generates an executable that causes the segmentation fault. However, if I verify its memory safety property with SMACK, I got no error trace.

The reason is that in the smacked version, the c variable in the final LLVM bitcode is dead, and its entire assignment statement is optimized away. While in the non-optimized code, the statement exists. This could bring about fatal security issues because the verified version is not the version that is finally executed.

Is there a way in the SMACK toolchain to solve the problem? Or does the toolchain provide the guarantee that the verified version is semantically identical to the -O2, or some level of an optimized version of clang?

bitcalc commented 5 years ago

I've looked into the llvm2bpl.cpp file and noticed that the current implementation uses existing LLVM passes and SMACK passes to transform the initial LLVM bitcode. It is hard for me to tell what clang compilation options generate the executable code that is semantically equivalent to the final bitcode used by SMACK for verification.

Could someone explain a little if the current implementation is equivalent to some optimization level of Clang? If not, what work needs to be done in order to achieve this ability? Based on my current knowledge of SMACK and Clang/LLVM optimization, I cannot tell if it's an easy or hard work for SMACK.

Thank you very mcuh!

zvonimir commented 5 years ago

Yes, this is a good observation. In the llvm2bpl you can find all the LLVM passes that SMACK invokes, as you already noticed. One could try to remove some of these passes to get something equivalent to O0 maybe, which should not be hard at all. Now, having said that, it would be quite hard to have some kind of more formal guarantees that what SMACK is verifying is exactly equivalent to O0 optimization level - I guess one could maybe compare LLVM IR analyzed by SMACK with the O0 optimization level. There is a chance they would maybe be identical. I think that something similar could be done for O1, O2, etc. I hope this helps. We could try and take a look at what we could disable in llvm2bpl. Finally, note that most LLVM-based verifiers that I am aware of would have similar problems since we all try to leverage LLVM passes to speed up verification (at the cost of verifying LLVM IR that is maybe not identical to any optimization levels).

bitcalc commented 5 years ago

After playing more with LLVM-based versifiers, I've observed the point that they have the same problem. So I'm with you. For practical purpose, an easier alternative (than comparing the LLVM IR analyzed by SMACK with that of clang with some optimization level) is generating LLVM IR from SMACK that could be compiled into object code, which may not correspond to any predefined optimization level of Clang, but the code verified can be made the same as that compiled at the LLVM level.

So I would request adding a new command option that, when turned on, outputs the LLVM IR of SMACK that can be compiled by the LLVM toolchain into an executable.

michael-emmi commented 5 years ago

Agreed, this should definitely be done!