uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.17k stars 253 forks source link

Reko should aggressively fuse segment:offset pairs where possible #318

Open uxmal opened 8 years ago

uxmal commented 8 years ago

The x86 16-bit code fragment

   les bx,ds:[0x100]
head:
   mov  es:[bx], 0
   inc bx
   dec cx
   jnz head

is decompiled into

    segptr32 es_bx_2 = ds->dw0100;
    seg16 es_3 = SLICE(es_bx_2, word16, 16);
    byte Eq_14::* bx_4 = (byte Eq_14::*) es_bx_2;
    do {
        es_3->*bx_4 = 0x00;
        bx_4 = bx_4 + 0x0001;
        cx = cx - 0x0001;
    } while (cx != 0x0000);
    return;

which, although correct, could be improved. MS-DOS / Win16 compilers traditionally compiled pointer arithmetic on far pointers (note: not huge pointers) as arithmetic only on the offset part of the pointer. It would be great if Reko could recognize that es_3 and bx_4 "belong together" since they were loaded from memory at the same time, and whenever both are used in the same expression, they can be replaced with the original es_bx_2. After this analysis, the resulting code is:

    byte * es_bx_2 = ds->dw0100;
    do {
        *es_bx_2 = 0x00;
        es_bx_2 = es_bx_2 + 0x0001;
        cx = cx - 0x0001;
    } while (cx != 0x0000);
    return;

which is a lot more pleasant to the human reader.

uxmal commented 8 years ago

Similarly, when a procedure returns a far pointer in the dx:ax registers, Reko should be smart and realize that the regsters are being used as a pair, and deal with them as a unit. E.g. decompiling:

    les ax,ds:[0x0124]
    mov dx,es
    ret

which today results in something like

    es_ax_2 = Mem0[ds:0x0124:word32]
    dx_3 = SLICE(es_ax_2, word16, 16);
    axOut = (word16) es_ax
    return dx_3

should instead result in:

    es_ax_2 = Mem0[ds:0x0124:word32]
    ds_ax_3 = es_ax_2
    return ds_ax_3

which simplifies to

    return Mem0[ds:0x0124:word32]