plast-lab / cclyzer

A tool for analyzing LLVM bitcode using Datalog.
MIT License
96 stars 14 forks source link

Points-to analysis does not detect dereferences in optimized LLVM IR #8

Open 1stl0ve opened 7 years ago

1stl0ve commented 7 years ago

Hello,

I have been using cclyzer for running points-to analyses on some C programs. I have run into a potential issue. I have been looking at the results in pointer-dereferences.tsv for the following C program:

#include <stdlib.h>

 int execute(double *b) {
     double k = *b;
     return (int)k;
 }

 int main(int argc, char *argv[])
 {
     double *t = (double *)NULL;
     execute(t);
     return 0;
 }

When I run cclyzer, it tells me that %t in main and %1 in execute are both pointers to *null*, which is what I expected.

When I apply the LLVM -mem2reg optimization to the C code, I get the following IR code:

; Function Attrs: nounwind uwtable
 define i32 @execute(double* %b) #0 {
   %1 = load double, double* %b, align 8
   %2 = fptosi double %1 to i32
   ret i32 %2
 }

 ; Function Attrs: nounwind uwtable
 define i32 @main(i32 %argc, i8** %argv) #0 {
   %1 = call i32 @execute(double* null)
   ret i32 0
 }

In this code snippet, %1 is a pointer dereference to null. However, pointer-dereferences.tsv does not contain any dereferences after analyzing this code with cclyzer. Is it possible to expand the points-to analysis to account for loading from a pointer that does not have an associated alloca instruction (i.e. using mem2reg to promote memory operations to register operations)?

Thanks,

Leo

gbalats commented 7 years ago

I'm not sure I follow your point. The code is dereferencing a null pointer, which is undefined behavior. So, what should variable %1 point to?

Why would you expect it to point to null?

Right now, there is a constraint that states that the special null location object cannot point to anything. This is important because we do not want our analysis to treat null as an ordinary pointer, to which you can store other memory addresses into.

The only sensible thing to add maybe, is a rule that returns the special unknown location object, after dereferencing null (after a load instruction). (However, this does not inform about undefined behavior, which is not the meaning of unknown location per se.)

1stl0ve commented 7 years ago

My point is that when I run cclyzer on the un-optimized code, it tells me that there is a source pointer (%1) that points to *null*, but when I run it on the optimized code, there is no report of a null pointer being dereferenced. However, there is clearly a dereference to a null pointer in the code. Ultimately, I see a pointer dereference in the code that is not reflected in pointer-dereferences.tsv.

Does that clarify my question?