jkmcnk / sx-gcc

The GNU Compiler Collection port to NEC SX CPU architecture.
GNU General Public License v2.0
0 stars 2 forks source link

problem with extracting floats from vectors #63

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
this probably has to do with bit field handling.

testcase gcc.c-torture/execute/20050604-1.c

on SX, single-prec FP is stored in the upper 32 bits of the register. (yes,
bizarre)

however, there are times when gcc decides to load SFs from a vector of them
two at a time using lds. however, when extracting a single SF from those 64
bits, it does the following (initial expansion, .104r.expand):

;; D.1517 = BIT_FIELD_REF <D.1506, 32, 0>
# this is load from memory. there are two floats at this location.
(insn 75 74 76 (set (reg:DI 177)
        (mem/c/i:DI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                (const_int 40 [0x28])) [0+0 S8 A64])) -1 (nil)
    (nil))

# we want the first float, so we shift right in order to have it in
# lower 32 bits
(insn 76 75 77 (set (reg:DI 178)
        (ashiftrt:DI (reg:DI 177)
            (const_int 32 [0x20]))) -1 (nil)
    (nil))

# move SI subreg to a new register
(insn 77 76 78 (set (reg:SI 179)
        (subreg:SI (reg:DI 178) 4)) -1 (nil)
    (nil))

# make a float out of it
(insn 78 77 0 (set (reg:SF 144 [ D.1517 ])
        (subreg:SF (reg:SI 179) 0)) -1 (nil)
    (nil))

this finally gets converted to the following assembler

#(insn 75 74 76 (set (reg:DI 35 s35 [177])
#        (mem/c/i:DI (plus:DI (reg/f:DI 1 s1)
#                (const_int 312 [0x138])) [0+0 S8 A64])) 62
{*movdi_general} (nil)
#    (nil))
    lds $s35,312(,$s1)  #movdi case3    # 75    *movdi_general/4    [length = 20]
#(insn 76 75 78 (set (reg:DI 35 s35 [178])
#        (ashiftrt:DI (reg:DI 35 s35 [177])
#            (const_int 32 [0x20]))) 29 {ashrdi3} (nil)
#    (nil))
    srax    $s35,$s35,32    # 76    ashrdi3/2   [length = 1]
#(insn 78 76 79 (set (reg:SF 36 s36 [orig:144 D.1517 ] [144])
#        (reg:SF 35 s35 [179])) 67 {movsf_general} (nil)
#    (nil))
    and $s36,$s35,(32)0 # EF: sure? # 78    movsf_general/1 [length = 1]

at first it seems that it is the shift right that is wrong, however, this
is not quite so. the shift right is OK (imho it stems from the fact that we
are accessing the first 32 bits of a bit field), but should immediately be
followed by an inverse shift to the left, and then the optimizer would
hopefully anihilate both shifts.

imho, it's the 

(subreg:SF (reg:SI 179) 0)

in insn 78

that should be expanded to a left shift of reg 179 for 32 bits.

Original issue reported on code.google.com by jmoc...@gmail.com on 2 Dec 2008 at 2:34

GoogleCodeExporter commented 8 years ago
argh.

actually, the last assembler line should be

#(insn 78 76 79 (set (reg:SF 36 s36 [orig:144 D.1517 ] [144])
#        (reg:SF 35 s35 [179])) 67 {movsf_general} (nil)
#    (nil))
    and $s36,$s35,(32)1 # EF: sure? # 78    movsf_general/1 [length = 1]

with (32)1 instead of (32)0 mask. I turned 1 to 0 during some experimenting and
forgot to turn it back. however, everything said above holds for the correct 
mask, (32)1.

Original comment by jmoc...@gmail.com on 2 Dec 2008 at 2:43

GoogleCodeExporter commented 8 years ago
ok. the way I see it, we have a problem since SFmode regs will actually have its
value in upper 32 bits, while SImode regs will have value in lower 32 bits.

the above problem is simply (not really elegantly, though) solved by adding an 
insn
for pattern

  (set (reg:SF ... (subreg:SF (reg:SI ...) 0)))

that generates a 32 bit shift left in its stead. such a fix is in the attached 
patch.
it is important that the insn comes before other insns with (set ...) pattern 
as I
fear some other could match.

however, I am not quite sure, whether this is the right way to fix this problem.
perhaps I am only fixing the superficial symptoms here, while we should take 
care of
something more at the root of the problem. don't know. erich, can you review 
this and
tell me what you think ...

Original comment by jmoc...@gmail.com on 2 Dec 2008 at 4:23

Attachments:

GoogleCodeExporter commented 8 years ago
also, optimizer won't be able to do s*t about right shift immediately followed 
by an
inverse left shift: I don't really like this fact.

Original comment by jmoc...@gmail.com on 2 Dec 2008 at 4:25

GoogleCodeExporter commented 8 years ago
Try a define_expand which generates the shift insn. That at least would be seen 
by
the optimizer. This, BTW, needs to be done with most of the extend insns.

Original comment by efo...@gmail.com on 2 Dec 2008 at 4:30

GoogleCodeExporter commented 8 years ago
indeed, that's what I wanted to do first, but: which expand is this? how do I 
find
out which name to use for it?

otoh, any opinion whether this is a proper fix, or is there something more
fundamental that we should be fixing in order to remedy this behaviour?

Original comment by jmoc...@gmail.com on 2 Dec 2008 at 4:39

GoogleCodeExporter commented 8 years ago
Issue 62 has been merged into this issue.

Original comment by jmoc...@gmail.com on 2 Dec 2008 at 4:40

GoogleCodeExporter commented 8 years ago
this is just bizzare. I think we are the only machine that stores SF and SI 
values in
different parts of DI-sized registers.

we are not dealing with SI->SF conversion here, although registers are used in
[D|S]Imode, they contain 32-bit floats. we just want to switch register mode 
from SI
to SF and retain the same value. gcc wants to do this with the movsf expand 
that by
default produces the following insn

(insn 78 77 0 (set (reg:SF 144 [ D.1517 ])
        (subreg:SF (reg:SI 179) 0)) -1 (nil)
    (nil))

however, this is wrong, as our much desired FP datum remains in lower 32 bits 
of reg 179.

imho, the solution is to fix movsf expand so that when above input is detected
(source is subreg of SI register) movsf does a DImode ASHIFT of reg 179 for 32 
bits
(moving 32-bit FP in upper 32-bits of the register), with reg 144 as the 
destination.
or move the shift result to reg 144 later. how to code this and keep gcc happy 
is
another problem altogether.

Original comment by jmoc...@gmail.com on 3 Dec 2008 at 4:14

GoogleCodeExporter commented 8 years ago
been trying to make gcc expand movsf correctly for this case most of today. I 
feel
like killing a compiler or two.

the following expand is the best I got to

(define_expand "movsf"
  [(set (match_operand:SF 0 "nonimmediate_operand" "")
        (match_operand:SF 1 "general_operand" ""))]
  ""
  "
{
  if ((reload_in_progress | reload_completed) == 0)
    {
      if(!register_operand (operands[0], SFmode) &&
         !register_operand (operands[1], SFmode))
        {
          rtx temp = force_reg (SFmode, operands[1]);
          temp = emit_move_insn (operands[0], temp);
          DONE;
        }
#if 1
      if (GET_CODE(operands[1]) == SUBREG)
        {
          rtx sreg = SUBREG_REG(operands[1]);
          uint bn = SUBREG_BYTE(operands[1]);
          if((GET_MODE(sreg) == SImode) && (bn == 0))
            {
              rtx tmp, nr;

              /* TODO: requires a bit more action */
              fprintf(stderr, \"starting magic\\n\");
              print_inline_rtx(stderr, operands[0], 0); fprintf(stderr, \"\\n\");
              print_inline_rtx(stderr, operands[1], 0); fprintf(stderr, \"\\n\");
              print_inline_rtx(stderr, sreg, 0);  fprintf(stderr, \"\\n\");
              fprintf(stderr, \"WTF11?!\\n\");
              tmp = gen_rtx_REG(DImode, REGNO(operands[0]));
              print_inline_rtx(stderr, tmp, 0); fprintf(stderr, \"\\n\");
              tmp = expand_simple_binop(DImode, ASHIFT,
                                        sreg, gen_rtx_CONST_INT(DImode, 32),
                                        tmp,
                                        0, OPTAB_WIDEN);
              print_inline_rtx(stderr, tmp, 0); fprintf(stderr, \"\\n\");
              DONE;
            }
         }
#endif
     }
}")

actually, it emits just the RTL I want:

(insn 75 74 76 (set (reg:DI 177)
        (mem/c/i:DI (plus:DI (reg/f:DI 129 virtual-stack-vars)
                (const_int 40 [0x28])) [0+0 S8 A64])) -1 (nil)
    (nil))

(insn 76 75 77 (set (reg:DI 178)
        (ashiftrt:DI (reg:DI 177)
            (const_int 32 [0x20]))) -1 (nil)
    (nil))

(insn 77 76 78 (set (reg:SI 179)
        (subreg:SI (reg:DI 178) 4)) -1 (nil)
    (nil))

(insn 78 77 79 (set (reg:DI 180)
        (sign_extend:DI (reg:SI 179))) -1 (nil)
    (nil))

(insn 79 78 0 (set (reg:DI 144)
        (ashift:DI (reg:DI 180)
            (const_int 32 [0x20]))) -1 (nil)
    (nil))

later on, reg 144 will be used as SF, but who cares ...

unfortunately, a few steps later gcc goes to compiler heaven with

20050604-1.c: In function 'foo':
20050604-1.c:28: internal compiler error: in refers_to_regno_for_reload_p, at
reload.c:6391

step 137r.lreg is the last one that succeeds. error occurs during 138r.greg, 
and this
assert in reload.c is the cause:

      /* If this is a pseudo, a hard register must not have been allocated.
         X must therefore either be a constant or be in memory.  */
      if (r >= FIRST_PSEUDO_REGISTER)
        {
          fprintf(stderr, "woooooooaaaaaaaaah: %d\n", r);
          if (reg_equiv_memory_loc[r])
            return refers_to_regno_for_reload_p (regno, endregno,
                                                 reg_equiv_memory_loc[r],
                                                 (rtx*) 0);

          gcc_assert (reg_equiv_constant[r] || reg_equiv_invariant[r]);
          return 0;
        }

I don't really understand, what's happening here, and why we seem to have failed
allocating a hard reg for 144.

Original comment by jmoc...@gmail.com on 3 Dec 2008 at 4:46

GoogleCodeExporter commented 8 years ago
oh, btw, output of 137r.lreg step seems to note that 144 was allocated to s36:

;; Register 144 in 36.

or what?

Original comment by jmoc...@gmail.com on 3 Dec 2008 at 4:52

GoogleCodeExporter commented 8 years ago
nah, can't use pseudo reg in different modes without using subreg. won't work 
this way.

Original comment by jmoc...@gmail.com on 4 Dec 2008 at 2:07

GoogleCodeExporter commented 8 years ago
for the record, this problem is solved with the following movsf expansion:

  if ((reload_in_progress | reload_completed) == 0)
    {
      if(!register_operand (operands[0], SFmode) &&
         !register_operand (operands[1], SFmode))
        {
          rtx temp = force_reg (SFmode, operands[1]);
          temp = emit_move_insn (operands[0], temp);
          DONE;
        }
      /* when moving an SI pseudo reg to SF pseudo reg by using
         a subreg expression, we need to take care, since SX
         keeps floats in upper 32 bits of register, while ints
         are in lower 32 bits. */
      if (GET_CODE(operands[1]) == SUBREG)
        {
          rtx sreg = SUBREG_REG(operands[1]);
          uint bn = SUBREG_BYTE(operands[1]);
          if((GET_MODE(sreg) == SImode) && (bn == 0))
            {
              rtx tmp, s1;

              /* TODO: requires a bit more action */
              fprintf(stderr, \"starting magic\\n\");
              print_inline_rtx(stderr, operands[0], 0); fprintf(stderr, \"\\n\");
              print_inline_rtx(stderr, operands[1], 0); fprintf(stderr, \"\\n\");
              print_inline_rtx(stderr, sreg, 0);  fprintf(stderr, \"\\n\");
              fprintf(stderr, \"WTF11?!\\n\");

              tmp = expand_simple_binop(DImode, ASHIFT,
                                        sreg, gen_rtx_CONST_INT(DImode, 32),
                                        NULL,
                                        0, OPTAB_WIDEN);
              print_inline_rtx(stderr, tmp, 0); fprintf(stderr, \"\\n\");

              s1 = gen_rtx_SUBREG(DImode, operands[0], 0);
              tmp = gen_rtx_SET(DImode, s1, tmp);
              emit_insn(tmp);
              /* (set (subreg:SI (reg:SF  */
              fprintf(stderr, \"magic done!\\n\");

              DONE;
            }
        }
    }

however, this just does not cut it, and works only without optimizations (-O0).
higher optimization levels attempt to reuse the inital register and insert the
(subreg:SF ...) rtx in the (set (reg:SF ...) (plus:SF ... )) insn, thus 
omitting the
movsf that we were trying to intercept altogether.

Original comment by jmoc...@gmail.com on 4 Dec 2008 at 3:48

GoogleCodeExporter commented 8 years ago
according to the above and as initially suspected, we should eliminate the root 
of
this problem, which is in gcc thinking that non-SFmode regs can easily change 
mode to
SFmode.

as all SX hardware regs are equal, nothing seemed wrong with this.

of course, since SF and SI data reside in different parts of a register, this 
is not
so, and #defining

CANNOT_CHANGE_MODE_CLASS(from, to, class)

to return 1 iff exactly one (but not both!) of from and to is SFmode fixes the
problem at its root.

there's no such thing as a free lunch, though. the price we pay is rather 
suboptimal
code: each such case will generate an extra stl/ldu pair.

Original comment by jmoc...@gmail.com on 4 Dec 2008 at 4:01

GoogleCodeExporter commented 8 years ago
oh, fyi, r153 fixes this.

Original comment by jmoc...@gmail.com on 4 Dec 2008 at 4:02