open-watcom / open-watcom-v2

Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.
Other
957 stars 157 forks source link

Use short jumps when possible with inline assembly #630

Open negge opened 3 years ago

negge commented 3 years ago

The inline assembly code generator always uses 32-bit relative jumps even when the target is within a short jump range.

Consider this test_abs.c example:

static int abs(int x);
#pragma aux abs parm [ eax ] = \
  "test eax, eax" \
  "jge @pos" \
  "neg eax" \
"@pos:" \
  value [ eax ]

#pragma aux main modify [ eax ebx ecx edx edi esi ebp ]
int main() {
  return abs(-1);
}

Building with wcc386 -s test_abs.c and disassembling gives:

Segment: _TEXT BYTE USE32 00000010 bytes
0000                main_:
0000  B8 FF FF FF FF            mov     eax,0xffffffff
0005  85 C0             test        eax,eax
0007  0F 8D 02 00 00 00         jge     L$1
000D  F7 D8             neg     eax
000F                L$1:
000F  C3                ret

Routine Size: 16 bytes,    Routine Base: _TEXT + 0000

which is 4 bytes bigger than using a short jump:

Segment: _TEXT BYTE USE32 0000000C bytes
0000                main_:
0000  B8 FF FF FF FF            mov     eax,0xffffffff
0005  85 C0             test        eax,eax
0007  7D 02             jge     L$1
0009  F7 D8             neg     eax
000B                L$1:
000B  C3                ret

Routine Size: 12 bytes,    Routine Base: _TEXT + 0000

This is something that the assembler should be able to automatically figure out.

Example above was generated with Version 2.0 beta Sep 15 2020 00:23:46 (32-bit).

jmalak commented 3 years ago

In-line assembler is one pass only and can not do backpatch because instructions are generated by assembler not by code generator. if you use jump back then in-line assembler is able optimize this jump, but if you use forward reference then can not do optimization, best solution is to use "short" keyword before target label as example

static int abs(int x);
#pragma aux abs = \
"L1: test eax, eax" \
  "jl L1" \
  "jge short L2" \
  "neg eax" \
"L2:" \
   parm [ eax ] value [ eax ]

#pragma aux main modify [ eax ebx ecx edx edi esi ebp ]
int main() {
  return abs(-1);
}

disassembled code

Segment: _TEXT BYTE USE32 0000000E bytes
0000                            main_:
0000  B8 FF FF FF FF                    mov             eax,0xffffffff
0005                            L$1:
0005  85 C0                             test            eax,eax
0007  7C FC                             jl              L$1
0009  7D 02                             jge             L$2
000B  F7 D8                             neg             eax
000D                            L$2:
000D  C3                                ret
negge commented 3 years ago

Thanks, this is the work-around I am currently using.

Is there a reason why the in-line assembler cannot be two pass?

jmalak commented 3 years ago

Take into account that all labels in C/C++/Fortran code are handled by code generator. In-line assembler is able handle only in-line code local labels (which are invisible for code generator) or global labels created by code generator. in-line assembly code is inserted as bytes sequence created by in-line assembler there is very limited interaction between code generator and in-line assembler. Two pass assembly is absolutely useless for in-line assembler. in-line assembler is only instruction encoder. Purpose of in-line assembler is to use simple code no complex assembly code therefore functionality is limited. Use "short" word is not big complication. We could add some kind of back-patching, but very small gain for big work.