jkmcnk / sx-gcc

The GNU Compiler Collection port to NEC SX CPU architecture.
GNU General Public License v2.0
0 stars 2 forks source link

-O0 execution failure : recursive calls ? #47

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Testcase execute.exp=20000402-1.c
calls umoddi3 recursively and that is why it is hanging on the SX.

But, umoddi3 as it is defined in libgcc2 is not recursive!

#ifdef L_umoddi3
UDWtype
__umoddi3 (UDWtype u, UDWtype v)
{
  UDWtype w;

  (void) __udivmoddi4 (u, v, &w);

  return w;
}
#endif

dbx output (stack where) from executable 20000402-1.x0.13424
1000 __umoddi3(0x0, 0x8000) at 0x400003098
1001 __umoddi3(0x0, 0x8000) at 0x400003098
1002 __umoddi3(0x0, 0x8000) at 0x400003098
1003 __umoddi3(0x0, 0x8000) at 0x400003098
1004 __umoddi3(0x0, 0x8000) at 0x400003098
1005 __umoddi3(0x0, 0x8000) at 0x400003098

Original issue reported on code.google.com by fred.tre...@googlemail.com on 29 Oct 2008 at 4:11

GoogleCodeExporter commented 8 years ago
__udivmoddi4 takes the incomming UDWtype and creates UWtype:s from the data.
After some different steps it calls the function __udiv_qrnnd_c which has the 
following
code :
UWtype __d1, __d0, __q1, __q0;
__r1 = (n1) % __d1; 

It seems that this mod generates a umoddi3 call although modsi3 would be the 
proper
call ?  UNITS_PER_WORD and MIN_UNITS_PER_WORD defined in sx.h 
plays a role in this. 

longlong.h is important :

   UWtype -- An unsigned type, default type for operations (typically a "word")
   UHWtype -- An unsigned type, at least half the size of UWtype.
   UDWtype -- An unsigned type, at least twice as large a UWtype
   W_TYPE_SIZE -- size in bits of UWtype

   UQItype -- Unsigned 8 bit type.
   SItype, USItype -- Signed and unsigned 32 bit types.
   DItype, UDItype -- Signed and unsigned 64 bit types.

   On a 32 bit machine UWtype should typically be USItype;
   on a 64 bit machine, UWtype should typically be UDItype.  */

Original comment by fred.tre...@googlemail.com on 12 Nov 2008 at 10:40

GoogleCodeExporter commented 8 years ago
Inside __udiv_qrnnd_c that is a #define macro there is a mod of type UWtype.

    printf("mod 1 %d %d %d\n",sizeof(__r1),sizeof((n1)),sizeof(__d1));  
    __r1 = (n1) % __d1; 
    printf("mod 1 - Done\n");

generates this output :
udiv_qrnnd call
mod 1 4 4 4
umoddi3  ...   <- umoddi called instead of umodsi

The expansion of the routine looks like this :

;; D.5259 = n1 % __d1
(insn 203 202 204 (set (reg:DI 341)
        (zero_extend:DI (mem/c/i:SI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                    (const_int 12 [0xc])) [0 n1+0 S4 A32]))) -1 (nil)
    (nil))

(insn 204 203 207 (set (reg:DI 342)
        (zero_extend:DI (mem/c/i:SI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                    (const_int 44 [0x2c])) [0 __d1+0 S4 A32]))) -1 (nil)
    (nil))

(insn 207 204 205 (set (reg/f:DI 347)
        (mem/u/c/i:DI (symbol_ref/u:DI (".LC13") [flags 0x2]) [0 S8 A64])) -1 (nil)
    (expr_list:REG_EQUAL (symbol_ref:DI ("__umoddi3") [flags 0x41])
        (nil)))
.....

I tried to implementing umodsi3 in sx.md as :
(define_insn "umodsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (umod:SI (match_operand:SI 1 "register_operand" "r")
                 (match_operand:SI 2 "register_operand" "r")))
   (clobber (match_scratch:DF 3 "=&r"))
   (clobber (match_scratch:DF 4 "=&r"))
   (clobber (match_scratch:SI 5 "=&r"))]
  ""
  "flt\\t%3,%1\\n\\
\\tflt\\t%4,%2\\n\\
\\tfdv\\t%3,%3,%4\\n\\
\\tfix\\t%5,%3,1\\n\\
\\tmps\\t%5,%5,%2\\n\\
\\tsbs\\t%0,%1,%5"
  [(set_attr "type" "fp")
   (set_attr "mode" "SI")
   (set_attr "length" "6")])

But that does not get selected when expanding SItype = SItype % SItype. 
Continuing to investigate why.. 

Original comment by fred.tre...@googlemail.com on 12 Nov 2008 at 1:51

GoogleCodeExporter commented 8 years ago
Added divsi3 to sx.md via define_expand. Also missing modsi3 and unsigned 
version. 
They are not mathematically correct at the moment but should "work" good enough.

gen_fix_truncdfdi seems to create problems when compiling some applications, 
though
not gcc itself. We need to find out what construct fails. 

The failure sais :
internal compiler error: in trunc_int_for_mode, at explow.c:55
It fails in an assert :
  /* You want to truncate to a _what_?  */
  gcc_assert (SCALAR_INT_MODE_P (mode));

Still the testcase does not pass, it fails in an illegal instruction. 
But the insertion must have worked somehow as there are no longer a recursion.
More investigation will be made. 

The current addition to the md file is : remove divdi3 and add :
(define_expand "divsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (div:SI (match_operand:SI 1 "register_operand" "r")
                (match_operand:SI 2 "register_operand" "r")))]
  ""
{
  rtx op1_df, op2_df, op0_df, op0_di;

  op0_df = gen_reg_rtx (DFmode);
  op0_di = gen_reg_rtx (DImode);

  op1_df = gen_reg_rtx (DFmode);
  expand_float (op1_df, operands[1], 0);

  op2_df = gen_reg_rtx (DFmode);
  expand_float (op2_df, operands[2], 0);

  emit_insn (gen_divdf3 (op0_df, op1_df, op2_df));

  emit_insn (gen_fix_truncdfdi2 (op0_di, op0_df));
  emit_move_insn (operands[0], gen_lowpart (SImode, op0_di));
  DONE;
})

(define_expand "udivsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (udiv:SI (match_operand:SI 1 "register_operand" "r")
                (match_operand:SI 2 "register_operand" "r")))]
  ""
{
  emit_insn (gen_divsi3 (operands[0], operands[1], operands[2]));
  DONE;
})

(define_expand "modsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (mod:SI (match_operand:SI 1 "register_operand" "r")
                (match_operand:SI 2 "register_operand" "r")))]
  ""
{
  rtx div_floor, prod;

  div_floor = gen_reg_rtx (SImode);
  prod = gen_reg_rtx (SImode);

  emit_insn (gen_divsi3 (div_floor, operands[1], operands[2]));
  emit_insn (gen_mulsi3 (prod, operands[2], div_floor));
  emit_insn (gen_subsi3 (operands[0], operands[1], prod));

  DONE;
})

(define_expand "umodsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (umod:SI (match_operand:SI 1 "register_operand" "r")
                (match_operand:SI 2 "register_operand" "r")))]
  ""
{
  emit_insn (gen_modsi3 (operands[0], operands[1], operands[2]));
  DONE;
})

The code probably breaks the build for many tests so it will not be checked in 
before properly adapted and tested. 
But ideas are welcome. The divsi3 example comes from ia64.md which has other
constructs as well. Might need more define_insn:s.

Original comment by fred.tre...@googlemail.com on 18 Nov 2008 at 5:56

GoogleCodeExporter commented 8 years ago
The current md is generating the following insn:s as oppose to the above that 
was
generated before (in divdi3.o)

It should be correct, but we need to check the calculations more detailed. 
The fixpoint conversion should actually be floor to make sure also negative 
numbers
work.. 

;; D.3698 = n1 % __d1
(insn 172 171 173 (set (reg:SI 324)
        (mem/c/i:SI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                (const_int 12 [0xc])) [0 n1+0 S4 A32])) -1 (nil)
    (nil))

(insn 173 172 174 (set (reg:SI 325)
        (mem/c/i:SI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                (const_int 44 [0x2c])) [0 __d1+0 S4 A32])) -1 (nil)
    (nil))

(insn 174 173 175 (set (reg:DF 330)
        (float:DF (reg:SI 324))) -1 (nil)
    (nil))

(insn 175 174 176 (set (reg:DF 331)
        (float:DF (reg:SI 325))) -1 (nil)
    (nil))

(insn 176 175 177 (set (reg:DF 328)
        (div:DF (reg:DF 330)
            (reg:DF 331))) -1 (nil)
    (nil))

(insn 177 176 178 (set (reg:DI 329)
        (fix:DI (reg:DF 328))) -1 (nil)
    (nil))

(insn 178 177 179 (set (reg:SI 326)
        (subreg:SI (reg:DI 329) 4)) -1 (nil)
    (nil))

(insn 179 178 180 (set (reg:SI 327)
        (mult:SI (reg:SI 325)
            (reg:SI 326))) -1 (nil)
    (nil))

(insn 180 179 0 (set (reg:SI 157 [ D.3698 ])
        (minus:SI (reg:SI 324)
            (reg:SI 327))) -1 (nil)
    (expr_list:REG_EQUAL (umod:SI (reg:SI 324)
            (reg:SI 325))
        (nil)))

Original comment by fred.tre...@googlemail.com on 18 Nov 2008 at 8:50

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
With the added sx.md routines for the divsi3 and the fixes in FP rounding the
testcase now passes. 

RUNTESTFLAGS="--target_board=sx6i execute.exp=20000402-1.c" make check

                === gcc Summary ===
# of expected passes            12

PASS: gcc.c-torture/execute/20000402-1.c compilation,  -O0 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -O0 
PASS: gcc.c-torture/execute/20000402-1.c compilation,  -O1 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -O1 
PASS: gcc.c-torture/execute/20000402-1.c compilation,  -O2 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -O2 
PASS: gcc.c-torture/execute/20000402-1.c compilation,  -O3 -fomit-frame-pointer 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -O3 -fomit-frame-pointer 
PASS: gcc.c-torture/execute/20000402-1.c compilation,  -O3 -g 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -O3 -g 
PASS: gcc.c-torture/execute/20000402-1.c compilation,  -Os 
PASS: gcc.c-torture/execute/20000402-1.c execution,  -Os 

Will add a FIXME label to make sure that the divsi and modsi are acting
mathematically correct. 

Original comment by fred.tre...@googlemail.com on 19 Nov 2008 at 12:17