gopalshankar / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
0 stars 0 forks source link

comparison and difference on unrelated pointers #269

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
From Dominique Pellé:
====================================================================
First of all, thank you for the address sanitizer.
This one exciting new tool for c++ developers.

A long time ago, I was using another dynamic
analyzer called Insure++ which was good too
(but not free software) and which found things that
Valgrind could not find. I don't have access to it
anymore. Insure++ also works at compilation time,
like ASAN. I remember a kind of bugs which Insure++
found and that ASAN does not find: comparison and
differences on unrelated pointers. See some simple
examples of such bugs detected by Insure++ here:

http://vasc.ri.cmu.edu/media/manuals/insure.5.x/ref/err_expp.htm
http://vasc.ri.cmu.edu/media/manuals/insure.5.x/ref/err_expf.htm

The obvious question: could ASAN also detect such bugs?

I remember finding a few such bugs with this check. Here
is a patch in Vim which I fixed with such a bug...

ftp://ftp.vim.org/pub/vim/patches/7.0/7.0.144

I don't remember the problem exactly, but I think
Vim compared a pointer in the heap with a another
pointer that could sometimes be set to a constant
empty string "" (so in .text section, or .rodata?)
and so comparing pointers did not make any sense.

I thought afterwards: maybe that'd be more
relevant for UBSAN than for ASAN.
====================================================================

This is a good idea, although the performance worries me. 
We could implement 
  __sanitizer_pointer_ge(void *a, void *b);
  __sanitizer_pointer_gt(void *a, void *b);
  __sanitizer_pointer_le(void *a, void *b);
  __sanitizer_pointer_lt(void *a, void *b);
  __sanitizer_pointer_sub(void *a, void *b);
and instrument the IR instructions like this:

  tail call void @__sanitizer_pointer_lt(i8* %a, i8* %b)
  %cmp = icmp ult i8* %a, %b

We may want to add some extra information from the frontend (e.g. names of 
variables).
There is also a risk that the operations on pointers will be transformed into 
operations on integers by the time IR reaches asan instrumentation.
If we insert the callbacks in the frontend it may inhibit many optimizations 
downstream,
so an alternative could be to assign a metadata to the appropriate 
instructions. 

The implementation of these functions could be along these lines:

__sanitizer_pointer_ge(void *a, void *b) {
   uptr flag = common_flags()->detect_unrelated_pointers;
   if (!flag) return;
   if (flag) {   // cheap part
      // This part can be computed almost instantly with asan's allocator
      bool in_small_heap1 = asan_addr_is_in_small_heap(a);  // i.e. heap allocation of <128K
      bool in_small_heap2 = asan_addr_is_in_small_heap(b);  // i.e. heap allocation of <128K
      if (in_small_heap1 != in_small_heap2) BARK()
      if (in_small_heap1 && in_small_heap2 && 
         asan_addresses_belong_to_different_allocations(a, b)) BARK()
      // neither 'a' nor 'b' are in the small heap
   }
   if (flag < 2) return;
   // can't check cheaply, do the more expensive check
   // The pointers are either in large heap, stack or globals
   // We can tell which class 'a' and 'b' is with reasonable overhead, 
   // but not as cheap as for the small heap.
   // Then, if the pointers are in the same class, we'll need even more work
   // to tell if the objects are different.
}

The check sounds more like ubsan-ish at first, but I don't see how it can be 
implemented w/o controlling the allocator.
We can implement the cheap part in msan and tsan in addition to asan because 
they share the allocator, 
but the expensive part will require the tool to know the boundaries of stack- 
and global objects, 
which is only available in asan. 

The problem is performance. Even if we hide this under a run-time flag like we 
do with stack-use-after-return detection,
the overhead will be several instructions per pointer comparison when the 
checking is off.
When the checking is on, the overhead will depend on where the pointers are.
If at least one is in the small heap (<128K, most common case) the overhead may 
be reasonable. 
If not, the overhead may be monstrous.

This is worth experimenting anyway.

Original issue reported on code.google.com by konstant...@gmail.com on 26 Feb 2014 at 8:16

GoogleCodeExporter commented 9 years ago
Of course, even greater concern is the amount of true positives and whether 
users will be willing to fix them. We'll see :) 

Original comment by konstant...@gmail.com on 26 Feb 2014 at 9:29

GoogleCodeExporter commented 9 years ago
Probably we can figure out some strictness levels for this feature. E.g. do not 
report, report more, report even more, report all.
I do not have ideas what patterns have higher chances of being "false/true 
positives" (in the sense of being harmful and users are willing to fix them) 
for now.

Original comment by dvyu...@google.com on 26 Feb 2014 at 9:49

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
When I used that feature with Insure++ (a long time ago), I almost never
saw any false positives. The only false positive that I remember, what
a program that was comparing 2 unrelated pointers both in the stack, to
find out in which direction the stack was growing. But that's a fairly
odd and rare thing to do in the first place.  I think it was in the
only false positive I ever saw was in this function in Vim source code
(in vim/src/os_unix.c):

 696 static int stack_grows_downwards; 
 697
 698 /*
 699  * Find out if the stack grows upwards or downwards.
 700  * "p" points to a variable on the stack of the caller.
 701  */
 702     static void
 703 check_stack_growth(p)
 704     char        *p;
 705 {
 706     int         i;
 707 
 708     stack_grows_downwards = (p > (char *)&i);
 709 }

...

3053     int                 i;
3054 
3055     check_stack_growth((char *)&i);

The comparison at line 708 is undefined as it is true
if the stack is growing in one direction or false if
the stack is growing in the other direction. That's
precisely what that function is checking, so in that
case it was a false positive.

Thinking about it, it should not complain when checking in both
directions as the pointer comparison is then valid.  Example:

int *p;
int a[10];
...
if (p >= &a[0] && p < &a[10])
{
  /* This always OK */
  ...
}

if (p >= &a[0])
{
  /* This has undefined behaviour if p and &a[0] are unrelated pointers */
}

Original comment by dominiqu...@gmail.com on 26 Feb 2014 at 10:31

GoogleCodeExporter commented 9 years ago
The false positives you describe are equally bad under asan's 
stack-use-after-return
checker and I am not afraid of them. They are rare, and could be suppressed by 
__attribute__((no_sanitize_address)).

What I am afraid of is *true* positives that users will not want to fix,
e.g. std::set<void *> or some such. 

Original comment by konstant...@gmail.com on 26 Feb 2014 at 10:41

GoogleCodeExporter commented 9 years ago
In my previous comment, I wrote about the example function 
check_stack_growth(...):

> The comparison at line 708 is undefined as it is true
> if the stack is growing in one direction or false if
> the stack is growing in the other direction. That's
> precisely what that function is checking, so in that
> case it was a false positive.

Thinking about it, I wonder whether it's a false positive
or not.  What happens if the compiler inlines function
check_stack_growth(...)? Then I'm not sure any more whether
the function will returns the expected result.  So I'm
not sure how portable this function is and warning about
it does not look bad to me.

Original comment by dominiqu...@gmail.com on 26 Feb 2014 at 10:42

GoogleCodeExporter commented 9 years ago
First kill: http://llvm.org/viewvc/llvm-project?view=revision&revision=202266
This is my own asan test, 
it was putting malloc-ed pointers into a vector and then sorting them 
(on purpose! the test is very asan-specific)

Original comment by konstant...@gmail.com on 26 Feb 2014 at 2:02

GoogleCodeExporter commented 9 years ago
Filed a related bug for clang/llvm:
http://llvm.org/bugs/show_bug.cgi?id=18989

Original comment by konstant...@gmail.com on 27 Feb 2014 at 12:53