jkmcnk / sx-gcc

The GNU Compiler Collection port to NEC SX CPU architecture.
GNU General Public License v2.0
0 stars 2 forks source link

gcc: memcpy-chk, memmove-chk fail badly #25

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Before they get out of sight: memcpy-chk, memmove-chk, memcpy-2 and other
similar tests fail badly. There are three types of failures:

 * abort : check code, possibly goes wrong path
 * illegal instruction: jumping to wrong location?
 * memory fault

The failures vary with the compiler options.

The three classes of bugs need to be checked in dbx to see what happens,
and possibly separate bugs need to be opened.

Erich

Original issue reported on code.google.com by efo...@gmail.com on 14 Oct 2008 at 4:13

GoogleCodeExporter commented 8 years ago
With the new includes (original SuperUX) these pass when built with -O0, but 
time out
otherwise. Needs reproducing inside dbx, maybe we have some unaligned variables 
that
generate loads of exceptions...?

Original comment by efo...@gmail.com on 15 Oct 2008 at 11:51

GoogleCodeExporter commented 8 years ago
Correction to previous comment: the failed jobs don't time out, they dump core 
(at
least on the sx8).

Testing memcpy-chk.c and debugging (compiling with -g -O1, running under dbx) 
shows
that it fails in test3() just at the return:

{{{
   400007dd0:   19 34 00 a0      : 193400a0 00000000    be>     0,0x0(,$s32)
}}}

At that point the stack is destroyed, "where" returns just the test3 function, 
and
$s32 has the value 0x4.

The code sequence where the stack is destroyed is between the second and the 
third
call of memcpy() in the following snippet:
{{{
test3 (void)
{
  struct A { char buf1[10]; char buf2[10]; } a;
  char *r = l1 == 1 ? &a.buf1[5] : &a.buf2[4];
  char buf3[20];
  int i;
  size_t l;

  /* The following calls should do runtime checking
     - length is not known, but destination is.  */
  chk_calls = 0;
  memcpy (a.buf1 + 2, s3, l1);
  memcpy (r, s3, l1 + 1);
  r = l1 == 1 ? __builtin_alloca (4) : &a.buf2[7];
  memcpy (r, s2, l1 + 2);
  memcpy (r + 2, s3, l1);
  r = buf3;
  for (i = 0; i < 4; ++i)
    {
      if (i == l1 - 1)
        r = &a.buf1[1];
      else if (i == l1)
        r = &a.buf2[7];
      else if (i == l1 + 1)
        r = &buf3[5];
      else if (i == l1 + 2)
        r = &a.buf1[9];
    }
...
}}}
The debugger shows me that l1=0, but doesn't know about the variable a.

Original comment by efo...@gmail.com on 16 Oct 2008 at 3:30

GoogleCodeExporter commented 8 years ago
With the -O0 compilation the buggy line including the following memcpy() call
translate to:
{{{
 #stabd68,0,262
    lds $s35,.LC106 #movdi case3
    lds $s35,0(,$s35)   #movdi case3
    cpx $s35,$s35,(63)0 # signed DI cmp, =r,r,M
    bne $s35,.L120
    lea $s35,312(,$s1)
    sts $s35,400(,$s1)
    adx $s1,24,$s1                     <---??????
    lds $s36,400(,$s1)  #movdi case3
    adx $s35,$s36,(60)0
    srl $s35,$s35,4
    slax    $s35,$s35,4
    sts $s35,400(,$s1)
    lds $s35,400(,$s1)  #movdi case3
    sts $s35,408(,$s1)
    be> 0,.L122
.L120:
    lea $s35,336(,$s1)
    adx $s35,17,$s35
    sts $s35,408(,$s1)
.L122:
    lds $s36,408(,$s1)  #movdi case3
    sts $s36,328(,$s1)
 #stabd68,0,263
    lds $s35,.LC110 #movdi case3
    lds $s38,0(,$s35)   #movdi case3
    lds $s35,.LC106 #movdi case3
    lds $s35,0(,$s35)   #movdi case3
    adx $s36,2,$s35
    or  $s37,-1,(0)1
    lds $s35,328(,$s1)  #movdi case3
    sts $s35,280(,$s1)
    sts $s38,288(,$s1)
    sts $s36,296(,$s1)
    sts $s37,304(,$s1)
    lds $s35,.LC109 #movdi case3
    lea $s36,4
    sts $s36,272(,$s1)
    lea $s34,272(,$s1)
    or  $s33,0,$s35
    bsic    $s32,($s33)
}}}
The marked line is definitely bad, as it changes the frame pointer. 
Interestingly, it
doesn't lead to a failure for -O0. With -O1 something similar happens.

I switched experimentally the STACK_POINTER_REGNUM to 2 (instead of 1) and the
produced code is then incrementing $s2. But that code fails badly in many more 
tests.

Next step: find out why it increments the frame/stack pointer???

Erich

Original comment by efo...@gmail.com on 16 Oct 2008 at 5:14

GoogleCodeExporter commented 8 years ago
Just a further hint to the origin of the problem:
{{{
(insn 65 64 66 7 (set (reg/f:DI 186 [ D.1921 ])
        (reg/f:DI 130 virtual-stack-dynamic)) -1 (nil)
    (nil))

(insn 66 65 67 7 (set (reg/f:DI 1 s1)
        (plus:DI (reg/f:DI 1 s1)
            (const_int 24 [0x18]))) -1 (nil)
    (nil))

(insn 67 66 68 7 (set (reg:DI 217)
        (plus:DI (reg/f:DI 186 [ D.1921 ])
            (const_int 15 [0xf]))) -1 (nil)
    (nil))
}}}

Things to look at: STATIC_CHAIN, TRAMPOLINE (trampoline is not really 
implemented).

Original comment by efo...@gmail.com on 17 Oct 2008 at 11:10

GoogleCodeExporter commented 8 years ago
With svn r98 (changes related to FRAME_POINTER_REGNUM and STACK_POINTER_REGNUM) 
the
test now fails in test4() instead of test3(). This is progress, though slow.

Original comment by efo...@gmail.com on 17 Oct 2008 at 5:04

GoogleCodeExporter commented 8 years ago
also, when building with -O1, gcc warns us:

jaka@wall:~/dev/sx/test/gcc/builtin$ gcc -o memcpy-chk.o1 memcpy-chk.c
memcpy-chk-lib.c lib/main.c -O1
memcpy-chk.c: In function 'test4':
memcpy-chk.c:335: warning: call to __builtin___memcpy_chk will always overflow
destination buffer

which does not happen with -O0 that builds and executes ok.

Original comment by jmoc...@gmail.com on 20 Oct 2008 at 10:29

GoogleCodeExporter commented 8 years ago
this probably also affects the following testcases in 
gcc.c-torture/execute/builtins/:

pr23484-chk.c
s[n]printf-chk.c
stpcpy-chk.c
strcat-chk.c
strcpy-chk.c
strncat-chk.c
strncpy-chk.c
vsnprintf-chk.c
vsprintf-chk.c

Original comment by jmoc...@gmail.com on 20 Oct 2008 at 12:02

GoogleCodeExporter commented 8 years ago
I cannot see the warning mentioned by Jaka in comment 6 (warning: call to
__builtin___memcpy_chk will always overflow destination buffer)

Compiled with:
/home/focht/Projects/Compiler/SXtree/libexec/gcc/sx8-nec-superux/4.2.2/cc1
-fpreprocessed memcpy-chk.i -dumpbase memcpy-chk.c -dP -auxbase memcpy-chk 
-Wall -g
-O1 -w -version -fno-show-column -fdump-rtl-all -o memcpy-chk.s
but no warning.

Either this is gone or I'm doing something wrong...

Original comment by efo...@gmail.com on 24 Nov 2008 at 11:23

GoogleCodeExporter commented 8 years ago
This bug can now pretty clearly be attributed to wrong __builtin_setjmp and
__builtin_longjmp expansion.

setjmp stores:
 * start of local variables (i.e. $s1 + 272)
 * jump target
 * stack pointer $s2
 * base pointer $s4

longjmp restores $s1 correctly if compiled with -O0, i.e. it subtracts 272 from 
the
first value and stores it into $s1, but this isn't done when compiled with -O1 
and
higher.

The storing of $s4 can actually only come from the save_stack_nonlocal insn 
which is
needed for nonlocal gotos.

Wondering whether we should actually save/restore all registers $s1-$s34 for
setjmp/longjmp... And if yes, isn't this actually needed for 
save_stack_nonlocal, too?

Erich

Original comment by efo...@gmail.com on 25 Nov 2008 at 2:26

GoogleCodeExporter commented 8 years ago
While stripping the test I found that for -O0 (__OPTIMIZE__ is not defined the 
test
returns early and isn't actually done. Which means, the fact that it passes for 
-O0
means NOTHING.

Original comment by efo...@gmail.com on 26 Nov 2008 at 9:52

GoogleCodeExporter commented 8 years ago
Uploading a very much stripped version of the test. Right now it looks like the 
code
has no reason to use the longjmp path:

  printf("n=%d size=%d\n", n, size);
  if (n > size)
    __chk_fail ();

prints: "n=2 size=5".

So either the object size detection is wrong, or it is an artifact of alignment 
(of
the mother structure of the object), in any case this is not testing 
setjmp/longjmp
currently.

Original comment by efo...@gmail.com on 26 Nov 2008 at 10:15

Attachments:

GoogleCodeExporter commented 8 years ago
Fred found that the wrong size is indeed due to the padding, i.e. the object

struct A { char buf1[10]; char buf2[10]; } a;

is allocated in three 8 byte words, and the size _after_ &a.buf2[9] is 1 + 4 
bytes
(from padding). Which means there is no overflow, actually, and the test is 
aborting
with good reason. Is that actually called padding? I'd think padding is at the
beginning, so what is the filling at the end of an object called?

Original comment by efo...@gmail.com on 26 Nov 2008 at 1:40

GoogleCodeExporter commented 8 years ago
it's called padding, regardless of whether it's done at the beginning of the 
struct,
in between the member or at the end. ;)

btw, did you try running the thing through dbx?

Original comment by jmoc...@gmail.com on 26 Nov 2008 at 1:45

GoogleCodeExporter commented 8 years ago
Yes, ran through dbx. Seeing line numbers and code lines (!) although the 
source code
is not on the SX. So is it embedded in the executable? The setjmp/longjmp thing,
though, is a pure assembler thing.

setjmp/longjmp now works fine on -O0. With -O1 it works for the first 2 tests, 
then
seems to land in nirvana, not yet sure what this is...

Original comment by efo...@gmail.com on 26 Nov 2008 at 2:21

GoogleCodeExporter commented 8 years ago
After commit svn r145 the IF branches look okay. The test still fails because 
of the
4 bytes padding behind the structure. Not sure how to get rid of that, 
__attribute__
((packed)) didn't help.

Erich

Original comment by efo...@gmail.com on 29 Nov 2008 at 11:56

GoogleCodeExporter commented 8 years ago
when building with O[1-4|s], I still get

memcpy-chk.c: In function 'test4':
memcpy-chk.c:335: warning: call to __builtin___memcpy_chk will always overflow
destination buffer

perhaps this is specific to x86 32-bit host.

Original comment by jmoc...@gmail.com on 8 Dec 2008 at 10:02

GoogleCodeExporter commented 8 years ago
Interesting, I don't see this warning on 64 bits. The warning is correct, and 
the
whole point of this test is that memcpy_chk will detect the overwrite and use 
longjmp
to jump back. This way the test will not reach the abort() call.

The only remaining problem is that some test will erroneously succeed, because 
of the
padding at the end of the structure. The structure should have length 20, but
actually rounds it up to 24 (3*8byte), so there is plenty of room behind the 
last
char and there is actually no overwrite. No idea how to get rid of the padding, 
OTOH
I'm not sure we really want and need to...

Original comment by efo...@gmail.com on 8 Dec 2008 at 11:59

GoogleCodeExporter commented 8 years ago
r165 should fix the last remaining issue. there really is nothing about our 
target
requiring structure sizes to be aligned to a multiple of 8.

Original comment by jmoc...@gmail.com on 11 Dec 2008 at 6:58

GoogleCodeExporter commented 8 years ago
btw, fix needs to be tested properly.

Original comment by jmoc...@gmail.com on 11 Dec 2008 at 6:59