matt-kempster / m2c

A MIPS and PowerPC decompiler.
GNU General Public License v3.0
394 stars 48 forks source link

Better heuristics for modulo and division in gcc 2.x #282

Open joshlory opened 1 month ago

joshlory commented 1 month ago

Hello! @bismurphy and I have noticed a number of division and modulo patterns that aren't picked up automatically.

Division

var_a0_2 = temp_v1_4;
if (temp_v1_4 < 0) {
    var_a0_2 = temp_v1_4 + 0xFF;
}
temp_a0 = var_a0_2 >> 8;
// matches as temp_a0 = var_a0_2 / 256;
 4d4:    bgez    v0,4e0 ~>
 4d8:    nop     
 4dc:    addiu   v0,v0,0xff
 4e0: ~> sra     v0,v0,0x8
 4e4:    sb      v0,4(s1)
 4e8:    sll     v0,v1,0x5

Modulo

var_a0 = D_146000_8017ABD0[a0 - ((a0 / 0x10) * 0x10)];
// matches as D_146000_8017AB90[temp_v1_6 % 16];
 544:    move    v0,a0
 560:    bgez    a0,56c ~>
 564:    sb      zero,0x28(s1)
 568:    addiu   v0,a1,0x10
 56c: ~> sra     v0,v0,0x4
 570:    sll     v0,v0,0x4
 574:    subu    v0,a0,v0
 578:    sll     v0,v0,0x2
 57c:    lui     at,%hi(D_146000_8017ABD0)
 580:    addu    at,at,v0
 584:    lw      a0,%lo(D_146000_8017ABD0)(at)

I see tests for https://github.com/matt-kempster/m2c/tree/master/tests/end_to_end/division-by-power-of-two and https://github.com/matt-kempster/m2c/tree/master/tests/end_to_end/modulo-by-power-of-two but not for gcc. Does it make sense for me to start there and import some of the reduced repros as test cases?

I'm new to m2c so happy to leave this up to the end user to identify, if it's not something we can heuristically determine. But if it's something we can pattern-match in more cases I'd love to take a crack at it!

ethteck commented 1 month ago

I think any tests / patterns would be appreciated! What we've done in the past after changing decompiler behavior is to run it against an entire project and then diff the result, just to make sure nothing looks crazy.

simonlindholm commented 1 month ago

There are some more tests in https://github.com/matt-kempster/m2c/tree/master/tests/end_to_end/gcc-division and https://github.com/matt-kempster/m2c/tree/master/tests/end_to_end/gcc-division-by-two (test organization is somewhat haphazard). Not sure whether these patterns are in there already, if not, adding the asm examples from this issue as e.g. new .s files in division-by-power-of-two/ and modulo-by-power-of-two/ sounds great.

Fixing this in m2c should be fairly simple, just a matter of adding more pattern-matching code like https://github.com/matt-kempster/m2c/blob/76808d564c79b9859798430644847e4baa71ae96/m2c/arch_mips.py#L197 (+ https://github.com/matt-kempster/m2c/blob/76808d564c79b9859798430644847e4baa71ae96/m2c/arch_mips.py#L1293).

simonlindholm commented 1 month ago

What we've done in the past after changing decompiler behavior is to run it against an entire project and then diff the result, just to make sure nothing looks crazy.

There are instructions in the README for setting this up, and it can be a great way to iterate on more complex m2c changes and seeing how they affect output in practice. I would say in this case it's not needed though, it's easy enough to convince yourself of the asm pattern's correctness in isolation.

ethteck commented 1 month ago

My thought was that patterns that are too general may introduce false positives, but idk if that's actually happened before

simonlindholm commented 1 month ago

It can definitely happen, I just don't expect it here.