sosy-lab / sv-benchmarks

Collection of Verification Tasks (MOVED, please follow the link)
https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks
184 stars 169 forks source link

Undefined Behavior in c/loops/linear_search.c #1241

Closed michael-schwarz closed 3 years ago

michael-schwarz commented 3 years ago

While looking at some of the test runs for Goblint, I stumbled across the c/loops/linear_search.c test case:

In this example an array is allocated on the heap, and the value at SIZE/2 is initialized to 3, the rest of it remains at indeterminate value.

https://github.com/sosy-lab/sv-benchmarks/blob/5a8eba01b1eec77c899907ac17cf96bfc4158616/c/loops/linear_search.c#L28-L30

linear_search then accesses the array at multiple locations where it has indeterminate value (e.g. 0). https://github.com/sosy-lab/sv-benchmarks/blob/5a8eba01b1eec77c899907ac17cf96bfc4158616/c/loops/linear_search.c#L15-L17

I guess a way to fix this test while preserving the intent would be adding:

   unsigned int i =0 ; 
   while(i < SIZE) {
      a[i] = __VERIFIER_nondet_int();
      i++;
    }

Some background on why I believe this is Undefined Behavior:

MartinSpiessl commented 3 years ago

This is a very interesting question!

"C99 3.17.2 (1) either an unspecified value or a trap representation"

Typo here, I guess it should be 3.19.2 (1). A nice overview of undefined behavior cases is always found in Annex J.2:

The behavior is undefined in the following circumstances:
[...] The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8). [...] A trap representation is read by an lvalue expression that does not have character type (6.2.6.1). [...]

So for objects with automatic storage duration this would be clearly undefined behavior. For the other way to get undefined behvior your raised I am not so sure. I never encountered this trap representation consideration before.

Seems like you follow mainly https://stackoverflow.com/questions/9219971/read-before-write-is-undefined-with-malloced-memory

Interesting read: https://queue.acm.org/detail.cfm?id=3041020

Trap representation are not always well understood, even by expert C programmers and compiler writers.6 A trap representation is an object representation that need not represent a value of the object type. Fetching a trap representation might perform a trap but is not required to. Performing a trap in C interrupts execution of the program to the extent that no further operations are performed.

Trap representations were introduced into the C language to help in debugging. Uninitialized objects can be assigned a trap representation so that an uninitialized read would trap and consequently be detected by the programmer during development. Some compiler writers would prefer to eliminate trap representations altogether and simply make any uninitialized read undefined behavior—the theory being, why prevent compiler optimizations because of obviously broken code? The counter argument is, why optimize obviously broken code and not simply issue a fatal diagnostic?

The text marked by me in bold would indicate that it is not generally accepted to be undefined behavior. Towards the end I also spotted this:

Conclusions The behavior associated with uninitialized reads is an unsettled issue that the C Standards Committee needs to address in the next revision of the standard (C2X) [...]

The C21 standard is not yet out there, so for me I cannot definitely answer the question whether this is undefined behavior. I tried valgrind and gcc with various sanitizer options but none reported this case as problematic access/undefined behavior.

I would argue that whether there is a trap representation is actually implementation-defined via Annex J.3:

J.3 Implementation-defined behavior A conforming implementation is required to document its choice of behavior in each of the areas listed in this subclause. The following are implementation-de f ned: [...] J.3.5 1Integers — Any extended integer types that exist in the implementation (6.2.5). — Whether signed integer types are represented using sign and magnitude, two’s complement, or ones’ complement, and whether the extraordinary value is a trap representation or an ordinary value (6.2.6.2).

As such, the behavior is only undefined if the compiler (in SV-COMP we orient towards GCC /GNU C) actually uses trap representations here, which according to my testing does not seem to be the case. This should also be documented by the compiler as stated in J.3. And for sure, I found just that for gcc: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html

Whether signed integer types are represented using sign and magnitude, two’s complement, or one’s complement, and whether the extraordinary value is a trap representation or an ordinary value (C99 and C11 6.2.6.2).

GCC supports only two’s complement integer types, and all bit patterns are ordinary values

So to me this reads like the implementation-defined behavior is not to have trap representations here, so I would conclude that this is not undefined behavior in GNU C, while for other compilers it could be.

Maybe I am completely wrong, I am sure other people might be able to give some input here as well.

michael-schwarz commented 3 years ago

Typo here, I guess it should be 3.19.2 (1)

I think it is 3.17.2 (1) in C99, and 3.19.2 (1) in C11.

https://queue.acm.org/detail.cfm?id=3041020

Yes, I found this too. While it sure is an interesting read, I wasn't able to draw a definite conclusion from it either: For example, on top of the things you quoted above, it also includes this snippet:

According to the current WG14 Convener, David Keaton, reading an indeterminate value of any storage duration is implicit undefined behavior in C, and the description in Annex J.2 (which is non-normative) is incomplete. This revised definition of the undefined behavior might be stated as "The value of an object is read while it is indeterminate."

Regarding GCC not exhibiting undefined behavior, I agree:

GCC supports only two’s complement integer types, and all bit patterns are ordinary values

Given a compiler that has this implementation-defined behavior for signed integers, there are no bit patterns that are trap representations, and, if one follows a literal reading of what the standard says (as opposed to the more restrictive one cited above), hence also no Undefined Behavior.

So a related question then is whether GNU C in the context of sv-comp is supposed to also mean that all Implementation-defined behavior is to mimic GCC.

So, at least to me, it seems like one can argue both ways here, and in the end it will boil down to making either one choice or the other. I'm looking forward to hearing what the general opinion is!

MartinSpiessl commented 3 years ago

I think it is 3.17.2 (1) in C99, and 3.19.2 (1) in C11.

You are right of course, I got fooled once again by the ISO/IEC 9899 name which has 99 in it also for newer version (my PDF for C11 is apparently just named ISOIEC9899.pdf, so I mistook it for the C99 one).

So a related question then is whether GNU C in the context of sv-comp is supposed to also mean that all Implementation-defined behavior is to mimic GCC.

Well I guess the question is what is GNU C, and how does it relate to GCC? https://www.gnu.org/software/gnu-c-manual/

The GNU C Reference Manual is a reference for the C programming language, as implemented by the GNU C Compiler.

So for me there is really no doubt that GNU C is really defined by what the GNU C Compiler does (and vice versa).

So, at least to me, it seems like one can argue both ways here, and in the end it will boil down to making either one choice or the other. I'm looking forward to hearing what the general opinion is!

I am looking forward to this as well! I cannot remember ever having discussed this particular problem (uninitialized memory from malloc), but there should have been precedences in the repo before i joined the project. I looked through old issues and e.g. found this:

https://github.com/sosy-lab/sv-benchmarks/pull/527

So here this seems to have been considered undefined behavior. It is always safe to remove occurrences of this and avoid the problem. On the other hand we want verifiers to be able to handle real-world programs, so if this is allowed in GCC in practice and I write a program that contains this, then the verifiers should be able to handle these semantics correctly.

But I also found this old issue: https://github.com/sosy-lab/sv-benchmarks/issues/274
Here in particular the comment by @tautschnig: https://github.com/sosy-lab/sv-benchmarks/issues/274#issuecomment-263494764

This benchmark does not have any undefined behaviour: reading the contents of memory allocated by malloc (with the SV-COMP guarantee that the call did not return a null pointer) yields an indeterminate value. Reading such a value only has undefined behaviour when the object has automatic storage duration. [...]

He is more in line with my chain of reasoning, i.e., this is not undefined behavior. This would also be in line with sanitizer options of gcc, and valgrind. I would find it hard to believe that there is no sanitizer for this if it really were undefined behavior. Maybe there is one and I just didn't find it yet though (even if it isn't considered undefined behavior, it can lead to buggy programs, so there really should be ways to check for this).

For the task at hand (linear_search.c) it of course makes sense to initialize the memory in any case. The fact that the search could terminate in every step because of arbitrary values in the array cannot be the intention of the original author. Speaking of which, I just looked up the commit that introduced the malloc to the task: https://github.com/sosy-lab/sv-benchmarks/commit/a1173ad1efc3333fb4b61a0b4d85da647da22ccd

Here many other files where changed in a similar fashion, I would expect that they potentially suffer from the same problem. Most of them look however as if they assign values first to all elements of the array before reading the elements, but there is another one in the loops folder: linear_sea.ch.c [sic!] that also accesses elements before assignment. In the version before that commit, we obviously had undefined behavior in case of linear_search.c and linear_sea.ch.cbecause there is an array with automatic storage duration whose elements are accessed when the value is still indeterminate.

mchalupa commented 3 years ago

I would find it hard to believe that there is no sanitizer for this if it really were undefined behavior. Maybe there is one and I just didn't find it yet though (even if it isn't considered undefined behavior, it can lead to buggy programs, so there really should be ways to check for this).

@MartinSpiessl There is the MemorySanitizer (-fsanitize=memory) that checks for uninitialized reads (AddressSanitizer does not do that). It is available with clang only, though.

==3979==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55f876205bae in linear_search /home/marek/src/sv-benchmarks/c/loops/linear_sea.ch.c:17:3
    #1 0x55f876205faf in main /home/marek/src/sv-benchmarks/c/loops/linear_sea.ch.c:29:23
    #2 0x7f7553350151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
    #3 0x55f87618719d in _start (/home/marek/src/sv-benchmarks/a.out+0x2019d)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/marek/src/sv-benchmarks/c/loops/linear_sea.ch.c:17:3 in linear_search
Exiting