sosy-lab / sv-benchmarks

Collection of Verification Tasks (MOVED, please follow the link)
https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks
184 stars 169 forks source link

Incorrect pre-processing of source files leads to incompatible C library redeclarations #121

Open delcypher opened 8 years ago

delcypher commented 8 years ago

I've been investigating an issue that Clang has been warning about all over the place (-Wincompatible-library-redeclaration) and I'd like to draw attention to it because it implies that people who have been preprocessing benchmarks have been a bit careless (or I have incorrect knowledge of which architecture a benchmark is intended for).

Here's an example (with -Werror=incompatible-library-redeclaration passed to Clang)

$ cd heap-manipulation
$ make CC=clang bubble_sort_linux_false-unreach-call.oi
clang building heap-manipulation/bubble_sort_linux_false-unreach-call.i
bubble_sort_linux_false-unreach-call.i:478:14: error: incompatible
redeclaration of library function 'malloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *malloc (size_t __size) __attribute__ ((__nothrow__ ,
__leaf__)) __attribute__ ((__malloc__)) ;
             ^
bubble_sort_linux_false-unreach-call.i:478:14: note: 'malloc' is a
builtin with type 'void *(unsigned int)'
bubble_sort_linux_false-unreach-call.i:480:14: error: incompatible
redeclaration of library function 'calloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *calloc (size_t __nmemb, size_t __size)
             ^
bubble_sort_linux_false-unreach-call.i:480:14: note: 'calloc' is a
builtin with type 'void *(unsigned int, unsigned int)'
bubble_sort_linux_false-unreach-call.i:492:14: error: incompatible
redeclaration of library function 'realloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *realloc (void *__ptr, size_t __size)
             ^
bubble_sort_linux_false-unreach-call.i:492:14: note: 'realloc' is a
builtin with type 'void *(void *, unsigned int)'
bubble_sort_linux_false-unreach-call.i:787:12: error: incompatible
redeclaration of library function 'snprintf'
[-Werror,-Wincompatible-library-redeclaration]
extern int snprintf (char *__restrict __s, size_t __maxlen,
           ^
bubble_sort_linux_false-unreach-call.i:787:12: note: 'snprintf' is a
builtin with type 'int (char *, unsigned int, const char *, ...)'
bubble_sort_linux_false-unreach-call.i:790:12: error: incompatible
redeclaration of library function 'vsnprintf'
[-Werror,-Wincompatible-library-redeclaration]
extern int vsnprintf (char *__restrict __s, size_t __maxlen,
           ^
bubble_sort_linux_false-unreach-call.i:790:12: note: 'vsnprintf' is a
builtin with type 'int (char *, unsigned int, const char *,
__builtin_va_list)'
5 errors generated.

AFAIK this benchmark (bubble_sort_linux_false-unreach-call.c) is supposed to be a 32-bit (I look at the Makefile in the directory for this, if CC.Arch is not set I assume 32-bit) program but the warnings Clang show indicate that the pre-processed file (bubble_sort_linux_false-unreach-call.i) was preprocessed for 64-bit compilation.

For this particular benchmark the problem is that the typedef for size_t is different depending on the architecture, as demonstrated below

$ clang -m32 -xc <(echo "#include <stdio.h>") -E -o - | grep -E
"^typedef.+ size_t;$"
typedef unsigned int size_t;
$ clang -m64 -xc <(echo "#include <stdio.h>") -E -o - | grep -E
"^typedef.+ size_t;$"
typedef long unsigned int size_t;

In bubble_sort_linux_false-unreach-call.i near the top you can see

typedef long unsigned int size_t;

which implies that the benchmark was preprocessed with -m64 (possibly implicitly) even though the benchmark is supposed to 32-bit. If I change the typedef to the one that should be used with -m32 then the Clang warnings go away.

In this particular case on 32-bit unsigned int and long unsigned int are the same size but I worry that people incorrectly pre-processing source files will lead to more problems that I haven't thought about.

delcypher commented 8 years ago

The simplest (and best) fix for this is to remove the *.i (preprocessed files) files completely and have the build system just build the *.c source files.

The purported reasons for having preprocessed files is that

Having worked a little bit with the *.i files I can list quite a number of disadvantages that I have come across:

dbeyer commented 8 years ago

For each verification task, it is specified whether it is meant for 32 bit or for 64 bit. If you find verification tasks that were incorrectly preprocessed, please fix those. Removing the preprocessed files is not an option.

PhilippWendler commented 8 years ago

Which directories contain the incorrectly preprocessed files that you have found? Is it only heap-manipulation?

delcypher commented 8 years ago

@PhilippWendler I'm afraid this is too long ago for me to remember. Using a recent clang with -Werror=incompatible-library-redeclaration will probably allow you to pick up which directories have at least some of the problems.

PhilippWendler commented 8 years ago

I did this, the list of affected directories is this:

array-examples
array-memsafety
bitvector-regression
ldv-commit-tester
ldv-consumption
ldv-linux-3.12-rc1
ldv-linux-3.16-rc1
ldv-linux-3.4-simple
ldv-linux-4.2-rc1
ldv-memsafety
ldv-races
ldv-regression
ldv-validator-v0.6
ldv-validator-v0.8
list-ext-properties
list-properties
loops
memory-alloca
memsafety
memsafety-ext
ntdrivers
product-lines
pthread
pthread-lit
seq-pthread
ssh
termination-numeric

The full list of files is in incompatible-library-redeclaration.txt.

Maybe this is also the cause for some of the undefined behavior reported in https://github.com/runtimeverification/evaluation/tree/master/svcomp-benchmark

In some files, the incompatible library definition is not that bad, because it only affects irrelevant calls to snprintf. In other files, however, it is clearly a sign of incompatible preprocessings.

As there are so many files, we should discuss a general approach to fixing this problem. Do we simply regenerate the preprocessed files with -m32? Or do we prefer to relabel some of these categories as 64bit?

delcypher commented 8 years ago

In some files, the incompatible library definition is not that bad, because it only affects irrelevant calls to snprintf. In other files, however, it is clearly a sign of incompatible preprocessings.

Oh wow. It's more serious than I thought. I would consider this yet another reason to avoid preprocessed source files or at the very least avoid letting humans do the preprocessing.

As there are so many files, we should discuss a general approach to fixing this problem. Do we simply regenerate the preprocessed files with -m32? Or do we prefer to relabel some of these categories as 64bit?

As this effects so many benchmarks I would recommend this is discussion is moved to the mailing list as this potentially affects many benchmarks.

zvonimir commented 7 years ago

Hi all. Are there any plans for this to be fixed? We are experiencing these issues in the memory safety category. We could create a patch that will fix just the benchmarks we are observing problems with, but that is clearly not the best solution. Having said that, if SVCOMP-wide solution is not feasible at this point, then we will work on submitting targeted fixes.

Oh, one more thing. I am pretty sure that I have seen patches to benchmarks being issues where only the .i file was fixed. So one has to be careful when regenerating benchmarks to make sure that these patches will not be overridden. Alternatively, we can just hope for the best and patch again if/when needed :).

PhilippWendler commented 7 years ago

@zvonimir Unfortunately, only very few people seem to care about this because there have been few contributions towards this, and the people that work on this have only limited man power. So if you want to get this fixed, please help and encourage others to help.

Since e2a1165 the build system shows which benchmarks potentially have problems with incorrect pre-processing because clang issues a warning about incompatible-library-redeclaration. You can see a list of directories where this warning currently occurs by grepping for this string in */Makefile. The current list of potentially affected directories (excluding ldv-*) is

Note that not all warnings about incompatible-library-redeclaration come from incorrect pre-processings, sometimes people just manually re-declare some standard function like log or memcpy with a different signature.

Because as you noticed there might have been changes to only the .i files somewhere, it is important that any changes with updated pre-processing need to be reviewable, despite their potentially large size. When I fixed the pre-processing of the directory heap-manipulation, I separated all trivial (https://github.com/sosy-lab/sv-benchmarks/commit/82ce8c6e632a0ef3e94562cbffcb9d85fac58d28) and whitespace-only (https://github.com/sosy-lab/sv-benchmarks/commit/81ce220b3264828f3aa59cdd83827bf4de6de8e9, https://github.com/sosy-lab/sv-benchmarks/commit/bf24e973d26d7ccfd5d234409c0ecf0987735446) changes from the rest of the changes (https://github.com/sosy-lab/sv-benchmarks/commit/743bb49c9c725599c9e3d7450435037de2bb7899). This made the latter change much smaller and easier to review. I suggest do follow the same procedure when fixing the pre-processings for other directories.

PhilippWendler commented 7 years ago

I created proposed fixes for the directories that I know: bitvector-regression (#208), loops (#206), ntdrivers and ssh (#207), and product-lines (#205). In all these cases there was a better solution than doing a new pre-processing: I fixed manually-written declarations manually, and replaced a few preprocessed files with their non-preprocessed sources.

In ntdrivers, there are actually a few remaining cases that implement malloc in some weird way. I am not sure whether these should be replaced with an extern malloc declaration or have their malloc renamed.

tautschnig commented 7 years ago

@PhilippWendler could we please document (in the README file in the long run, but as first step just in here) what the system is we should be preprocessing on, and what the exact command to use is? At present, the README just says use "cpp -P or something similar". I'd be very happy to help, but frankly I don't know what system to use so as to avoid massive changes, or having to review massive changes before git add -p-ing just those few that we actually want.

Though, see also my comments on the mailing list today, which would mean we could just close this one.

PhilippWendler commented 7 years ago

Currently there are no such requirements, except that no line markers are generated and that the architecture matches. In practice almost all benchmarks have been preprocessed with GCC and its standard library so far. I'm not in the position to impose any requirements, but I would suggest to use gcc -P -E -m<arch> on a recent Ubuntu LTS.

I am afraid there is no way to avoid generating massive changes when re-preprocessing old files (at least I do no know any way). I guess one could attempt to replicate the old preprocessing environment as closely as possible (sometimes this was documented in the commit message, or try the latest Ubuntu LTS at that time), but I guess that if you change the architecture compared to the old preprocessing, you will get massive changes even with the same environment. But maybe you find some way? What I did was to try making the changes as easily reviewable as possible (cf. description here).

tautschnig commented 5 years ago

If anyone has Ubuntu 16.04 at hand, running the below script should hopefully yield the most reasonable diff. Neither 14.04 nor 18.04 seemed like the best candidates.

#!/bin/bash

set -e

for d in \
 array-memsafety \
 ldv-races \
 ldv-regression \
 list-ext-properties \
 list-properties \
 memory-alloca \
 memsafety-ext \
 memsafety \
 pthread-atomic \
 pthread-ext \
 pthread-lit \
 pthread-nondet \
 pthread-wmm \
 pthread \
 seq-pthread \
 termination-libowfat
do
  echo $d
  cd $d
  for f in *.c
  do
    ff=$(basename $f .c)
    if [ ! -s $ff.i ]
    then
      if [ ! -s $f.i ]
      then
        echo "preproc file $ff.i not found"
        continue
      else
        ff=$f
      fi
    fi
    gcc -m32 -E -P $f -o $ff.i
  done
  cd ..
done

cd ldv-memsafety/memleaks-notpreprocessed/
for f in *.c
do
  gcc -E -P -m32 $f -o ../$(basename $f .c).i
done

Or maybe we should just finally get rid of preprocessed files altogether.

tautschnig commented 5 years ago

Just for the record: Ubuntu 16.04 doesn't help much either. @kfriedberger What OS have you been working on when preparing e76bcd95? I would then re-try processing on that OS.

kfriedberger commented 5 years ago

The commit e76bcd9 is quite old (over 4 years). At that time I worked with Ubuntu 12.04 or 14.04 or Arch Linux (rolling release, no version number), all of them in x64_86. None of the machines has survived ;-)

tautschnig commented 5 years ago

@kfriedberger Thanks, I'll try to set up a machine with one of those.

tautschnig commented 5 years ago

I have tried both Ubuntu 12.04 and Arch Linux (though a very recent one for the latter), and neither provide particularly good results. I'll now simply go for Ubuntu 18.04 as it is no worse than the others and matches the competition environment.

tautschnig commented 5 years ago

I believe #765 finally fixes this problem for all remaining folders.

MartinSpiessl commented 3 years ago

765 is merged, does this mean this issue can be closed?