jkmcnk / sx-gcc

The GNU Compiler Collection port to NEC SX CPU architecture.
GNU General Public License v2.0
0 stars 2 forks source link

no optimizations produce corrupt code #35

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
building with -O0 produces corrupt code (segfault). building with -Ox, x !=
0 produces running code.

this affects the following testcases from gcc.c-torture/execute/builtins/:

strlen-2.c
20000205-1.c

Original issue reported on code.google.com by jmoc...@gmail.com on 20 Oct 2008 at 12:52

GoogleCodeExporter commented 8 years ago
argh, strlen-2 testacse is actually built from three files ... the following 
stems
from a permutation of optimization switches on all three source files:

1. it does not really matter, what optimization switches are used for lib/main.c
2. if strlen-2.c is compiled with -O0 and strlen-2-lib.c with -O1, the thing 
aborts.
3. if both strlen-2.c and strlen-2-lib.c are compiled with -O0, the thing 
segfaults.
4. if strlen-2.c is compiled with -O1 and strlen-2-lib.c is compiled with -O0, 
the
thing runs ok.

Original comment by jmoc...@gmail.com on 20 Oct 2008 at 1:01

GoogleCodeExporter commented 8 years ago
case (4) is OK and even the abort in (2) is actually OK (strlen-2.c calls
non-built-in strlen() which should abort in this case.

however, (3) is just plain strange. I walked the main_test() func with dbx on 
SX8 and
it turns out that segfault is done in strlen(), where we try to load something 
in
memory just a few steps in the func:

ld1b    $s35,0(0,$s35)

where $s35 = 0x000000007d1b9b40

this is a pointless address as our memory layout is afaik:
text/data >= 0x0000000400000000
stack     >= 0x0000000800000000

(NOTE: in -O0 case, this is the strlen() we provide in strlen-2-lib.c)

the strange thing is, this is the first (of two?!) parameters to our strlen() 
call;
dbx shows the following stack trace:

(dbx) where
*  0 strlen(0x7d1b9b40, 0x86400000) at 0x400001400
   1 main_test() at 0x400000cf4
   2 main(0x1, 0x8000000150, 0x8000000160) at 0x4000016bc
   3 _start(0x1, 0x8000000150, 0x8000000160) at 0x4000005a0

now, I notice that main() has three params as well and it seems that the first 
two
are "real" params, argc == 0x1 after all.

anyhow, the value of the first strlen() param is strange. 

I'm especially puzzled by the following (based on assembly source and dbg 
tracing):

1. prior to strlen() call, we put the only argument in memory twice:

first time via $s1 (hard frame ptr):
    sts $s36,320(,$s1)
this one stores at 0x8...1f70

and immediately afterwards once again via $s2 (stack ptr):
    lds $s37,320(,$s1)
    sts $s37,-296(,$s2)
this one stores at 0x8...1f68

the destination addresses for both sts instructions are exactly 8 bytes apart.

then, we load strlen() address to $s35.

now, we put 0x2 in $s36 and store it via stack ptr:

    sts $36,-336(,$s2)
the destination address is 0x8...1f40

this address is lea-ed into $s34:
    lea $s34,-336(,$s2)

NOTE: here lies, imho, the root of all problems. $s34 should be arg ptr, right? 
well,
args begin at 08x...1f70 (or 1f68, I don't know why it's put on stack twice ...)

finally, we copy $s35 to $s33 and bsic to strlen():
    bsic $s32,($s33)

in strlen(), 8 regs are stored from 0x8...1f80 onwards. $s1 is set to point to 
this
address. $s2 points to 0x8...21c0.

after initial check (should we abort()), strlen() goes on to
    lds $s35,280(,$s1)
which loads from 0x8...2034 (pointless, surely our params are not there, and 
indeed,
$s35 contains 0 after this)

then we seem to want to pop the argument
    lds     $s36,8(0,$s3)
but since arg ptr is WRONG (look above), this fetches thing from 0x8...1f48 and 
there
we have 0x000000007d1b9b40 in $s36, which is clearly an invalid address (also 
the
string we passed as param is at 0x4...a90)

so the code created by gcc to call strlen() is WRONG. erich?

Original comment by jmoc...@gmail.com on 20 Oct 2008 at 4:14

GoogleCodeExporter commented 8 years ago
First of all, this is the second if block in the main_test. The first one is
optimized away. You can see this with -fdump-tree-all.

From the end of your report: lea $s34,-336(,$s2) seems ok, that is the point 
where
the arguments (first the number of arguments) are supposed to be. The prologue 
code
confirms that $s2 is set right. The number of arguments seems wrong. When 
compiling
with -dP in the assembler you see that the number of bytes passed are 16 (in the
const_int), so converting that to 2 arguments is right. But 16 bytes is wrong 
in the
first place. I tried with only calling strlen() and passing it a char*, the 
number of
arguments was set to 2, again(!). Maybe that is one of the first places to look 
for.

The reason why the pointer is placed in a completely wrong place, I don't know. 
One
could search for reasons why the wrong args are passed (some macros in sx.h 
decide
about how and how many arguments are passed). Also it is possible that the 
call_value
insn or expansion are somehow broken or too limited. Look into contrib/b for a
completely different approach (that one lead to more errors).

Original comment by efo...@gmail.com on 20 Oct 2008 at 10:58

GoogleCodeExporter commented 8 years ago
argh. still no luck. however, it seems the problem is not related to overriding
builtin strlen (just tried it with mystrlen() in its stead). so it must be in 
the
complex expression passed as the parameter to strlen (and/or faulty 
optimization of
always-false-conditions). if not, then it must be an act of god.

Original comment by jmoc...@gmail.com on 21 Oct 2008 at 11:32

GoogleCodeExporter commented 8 years ago
btw, it's the third if in strlen-2.c; both first and second are optimized away 
even
with -O0, which is just plain strange to me - how dares gcc speculate about the
return value of strlen(), which it needs to if it wants to optimize those two 
ifs away?

Original comment by jmoc...@gmail.com on 21 Oct 2008 at 11:51

GoogleCodeExporter commented 8 years ago
ok. I think I got to the minimal testcase. the problem seems to be triggered by 
using
the ternary if-then-else operator ?: in place of a function argument. compile 
the
attached testcase to assembly with -O0, and inspect it: arg count (1) for 
blah() call
is stored -304 off SP ($s2), while the only arg is stored at -280 off SP. 
there's
"whatever's in memory" in between and blah() in turn refers to that (8 off the 
arg
ptr ($s3), which is -304 off old SP.

the other conditionally compiled branch (not enabled by default) of the testcase
compiles OK, as the argument is calculated separately using ?: and stored in a 
temp var.

Original comment by jmoc...@gmail.com on 21 Oct 2008 at 2:12

Attachments:

GoogleCodeExporter commented 8 years ago
Very interesting! This is exactly the case with the still failing cases in bug 
#34.
Probably will set that bug as duplicate now...

Original comment by efo...@gmail.com on 21 Oct 2008 at 2:27

GoogleCodeExporter commented 8 years ago
this is a rather wild speculation, but as we consistently put the argument on 
the
stack 16 bytes off, I'd say that we somehow think that the ?: construct takes 24
bytes (3x 8bytes = 1x size_t + 2x ptr). perhaps a type-related problem?

Original comment by jmoc...@gmail.com on 21 Oct 2008 at 2:40

GoogleCodeExporter commented 8 years ago
This happens also if you call a function with integer arguments and have inside 
the
arguments the ?: . Look at the -fdump-tree-all (last printed tree dump) and you 
will
see that the internal code generated looks very much like the thing that's 
working
for you. I.e. there are intermediate variables.

Can you check what you really have in the missing place? maybe the intermediate
variables are placed there? From -fdump-rtl-all I can't say things look wrong. 
The
arguments are virtual-outgoing-args until the step 110r.vregs (instantiate 
virtual
registers), and their datatypes look reasonably (DI). This does not look as if 
the
data stored as argument is wrong. It rather looks like the 
virtual-outgoing-args are
pointing to the wrong place, as if someone else has lifted the arg pointer or
allocated a stack slot (or two) in the wrong place.

Original comment by efo...@gmail.com on 21 Oct 2008 at 3:01

GoogleCodeExporter commented 8 years ago
there is *nothing* in the missing place (between arg at wrong address and arg 
count).
if you read the assembler, you can see that nothing is ever put there by the 
caller.
callee accesses this place thinking that the arg is there (it should be, but 
isn't).

Original comment by jmoc...@gmail.com on 21 Oct 2008 at 3:17

GoogleCodeExporter commented 8 years ago
Fancy... in the code I produce I see that the intermediate result of the ?: is 
stored
at 288(,$s1). This is two slots above 272(,$s1). As maximum number of arguments 
is 1
(blah()), and one slot is reserved for the number of arguments, the place where 
the
?: result is stored is the first available slot for automatic variables. So 
this part
has worked correctly.

On the other hand, when calling the function blah() arguments are stored 
relative to
the stack pointer $s2. The only argument is stored at -280(,$s2) which is 
actually
equal to 296(,$s1). The correct place would be 280(,$s1)...

The decision for the place to store the arguments is done very early (.vreg 
step),
while the place for storing the local variable is really there only after .greg.
Maybe this means we need to delay storing the arguments until the other 
registers are
done? In other implementations there are several places where such delays need 
to be
done...

Original comment by efo...@gmail.com on 21 Oct 2008 at 3:39

GoogleCodeExporter commented 8 years ago
grrrrrr ... I'm getting really annoyed by this. a fix, a fix, a kingdom for a 
fix!

the main difference between -O0 and -O1 (latter works OK) is in the fact that 
-O1
code does not put intermediate value resulting from ?: operator (iftmp0 in rtl) 
on
the stack.

-O0 (fails miserably):
1. ?:
2. sts result,288(,$s1) (one or the other result)
3. lds reg,288(,$s1)    (reloads it result of ?:)
4. sts reg,-280(,$s2)
5. sts argcount,-304(,$s2)
6. lea argptr,-304(,$s2)
7. bsic

-O1 (works like a charm):
1. ?: (result is in reg)
2. sts reg,-280(,$s2)
3. sts argcount,-288(,$s2)
4. lea argptr,-288(,$s2)
5. bsic

thus, it seems that it's the placing of tmp result of expression used for arg 
on the
stack that confuses gcc's idea of what is where on the stack.

btw, -fverbose-asm is your friend ... ;)

Original comment by jmoc...@gmail.com on 22 Oct 2008 at 2:33

GoogleCodeExporter commented 8 years ago
one more thing: I was wrong about nothing being put between arg count and the 
only
arg (see comment 10); actually, the stack looks like this with -O0 just before 
doing
the bsic to blah():

SP ($s2) -------------> callee RSA (272b)
                        ...
SP - 280 -------------> arg        (8b, == iftmp0 == result of ?!)
SP - 288 -------------> iftmp0     (8b, result of ?:)
SP - 296 -------------> ????       (8b, nothing stored here)                    

ARGPTR = SP - 304 ----> arg count  (8b, == 1 in our case)
                        caller RSA (272b)
                        ...

1. argcount and argptr are assigned just after caller RSA at SP-304: correct, as
outgoing args should be just after caller RSA, before the arguments

2. iftmp0, a (auto) local, comes after args space at SP-288: correct.

3. the only arg should be put at SP-296 (empty slot) after argcount in the args
space, however, gcc somehow decides that it should go at SP-280.

4. there is also 8 bytes too much on the stack, the only (auto) local is 
iftmp0, we
have 16 bytes allocated for locals.

looking at rtls is beginning to give me a slight headache. I expect it to 
progress as
the evening draws nearer ... ;)

Original comment by jmoc...@gmail.com on 22 Oct 2008 at 3:46

GoogleCodeExporter commented 8 years ago
moving args after locals helps: now the thing works with all optimization 
levels,
including optimizing for size.

however, we still leave one slot unused, according to my reverse engineering of 
stack
from assembly for s.c:

SP ($s2) -->
            RSA      (callee, 272b)
SP - 280 -->
            arg      (8b)
SP - 288 -->
            argcount (8b)
            ???????? (8b, wtf does this come from?!)
SP - 304 -->
            local0   (8b, iftmp0)
            RSA      (current, 272b)
HFP ($s1) ->

being a pedantic sob, I want this empty slot eliminated: the fight is not over 
yet! ;)

Original comment by jmoc...@gmail.com on 22 Oct 2008 at 6:05

GoogleCodeExporter commented 8 years ago
empty space is due to making frame size a multiple of 16 bytes (see 
sx_compute_frame_size()).

Original comment by jmoc...@gmail.com on 23 Oct 2008 at 8:25