uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.16k stars 254 forks source link

C-Output: return expressions directly #901

Open rfalke opened 4 years ago

rfalke commented 4 years ago

Subject: https://github.com/rfalke/decompiler-subjects/tree/master/from_holdec/dmi/exp/ia32_elf

Output

word32 advanced_1_boolean_minization(uint32 dwArg04, ui32 dwArg08, word32 dwArg0C)
{
    word32 eax_25;
    if (dwArg04 == 0x00 && (dwArg08 == 0x00 && dwArg0C != 0x00) || (dwArg04 == 0x00 && (dwArg08 != 0x00 && dwArg0C != 0x00) || dwArg04 != 0x00 && dwArg08 == 0x00))
        eax_25 = 0x01;
    else
        eax_25 = 0x00;
    return eax_25;
}

Expected:

word32 advanced_1_boolean_minization(uint32 dwArg04, ui32 dwArg08, word32 dwArg0C)
{
    return (dwArg04 == 0x00 && (dwArg08 == 0x00 && dwArg0C != 0x00) || (dwArg04 == 0x00 && (dwArg08 != 0x00 && dwArg0C != 0x00) || dwArg04 != 0x00 && dwArg08 == 0x00));
}

Maybe it is also an improvement to replace "0x00" with "0" to reduce the output size.

ptomin commented 4 years ago

I prefer

word32 advanced_1_boolean_minization(uint32 dwArg04, ui32 dwArg08, word32 dwArg0C)
{
    if (dwArg04 == 0x00 && (dwArg08 == 0x00 && dwArg0C != 0x00) || (dwArg04 == 0x00 && (dwArg08 != 0x00 && dwArg0C != 0x00) || dwArg04 != 0x00 && dwArg08 == 0x00))
        return 1;
    else
        return 0;
}

in this case.

rfalke commented 4 years ago

Because this case has a larger expression or in general?

So what about a simpler case?

Current output:

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        word32 eax_10;
        if (dwArg04 == 0x2A)
                eax_10 = 0x00;
        else
                eax_10 = 0x01;
        return eax_10;
}

I would prefer:

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        return dwArg04 != 0x2A;
}

or

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        return dwArg04 != 0x2A ? 1 : 0;
}

or

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        return _bool_to_int(dwArg04 != 0x2A);
}

The main improvement IMHO is that you can see that both branches of the if-else assign to the same variable. There may also be a language which allows something like the following which also would capture this (but it is still longer than the others above and leaves the well known C paths):

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        word32 eax_10 = (if (dwArg04 == 0x2A) 0x00 else 0x01);
        return eax_10;
}
ptomin commented 4 years ago

I think decompiled code should be as close to original as possible. So I would prefer

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        if (dwArg04 == 0x2A)
                retutn 0;
        else
                return 1;
}

if there is conditional jump in assembly code and

word32 intermediate_1_ternary_blocks(Eq_11 dwArg04)
{
        return dwArg04 != 0x2A;
}

if there is no jumps.

rfalke commented 4 years ago

So for you "original" means the assembler. I would argue that different compilers translate the same source code to different assembler instructions and I would like to read an output code which doesn't have this variants. This would make the output easier to read. So something like "same semantics should result in the same output code".

rfalke commented 1 year ago

Issue is still in version 0.11.4.0-931ca7d.