mumbel / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
9 stars 1 forks source link

Base + offset addressing mode #8

Open esaulenka opened 5 years ago

esaulenka commented 5 years ago

Ghidra produses strange code when meets with inderect addressing.

For example, in pcmflash..._2726.bin register a0 writes only once - with value 0xD000 BC00. When I set this value (for whole code), i get:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             void __stdcall FUN_8006f8f4(void)
                               assume a0 = 0xd000bc00
             void              <VOID>         <RETURN>
                             FUN_8006f8f4                                    XREF[1]:     800700a2(c)  
        8006f8f4 82 00           mov        d0,#0x0
        8006f8f6 d9 03 60 b9     lea        a3,[a0]-0x6920
        8006f8fa 82 01           mov        d1,#0x0
        8006f8fc d9 02 60 c9     lea        a2,[a0]-0x68e0
        8006f900 3b 00 01 20     mov        d2,#0x10
                             LAB_8006f904                                    XREF[1]:     8006f916(j)  
        8006f904 8f 20 20 f0     sha        d15,d0,#0x2
        8006f908 c2 10           add        d0,#0x1
        8006f90a 10 3f           addsc.a    a15,a3,d15,#0x0
        8006f90c 37 00 68 00     extr.u     d0,d0,#0x0,#0x8
        8006f910 68 01           st.w       [a15]#0x0,d1
        8006f912 10 2f           addsc.a    a15,a2,d15,#0x0
        8006f914 68 01           st.w       [a15]#0x0,d1
        8006f916 3f 20 f7 ff     jlt.u      d0,d2,LAB_8006f904
        8006f91a 00 00           nop
        8006f91c 00 90           ret
void FUN_8006f8f4(void)
{
  int iVar1;
  uint uVar2;

  uVar2 = 0;
  do {
    iVar1 = uVar2 * 4;
    uVar2 = uVar2 + 1 & 0xff;
    *(undefined4 *)(iVar1 + -0x2fffad20) = 0;
    *(undefined4 *)(iVar1 + -0x2ffface0) = 0;
  } while (uVar2 < 0x10);
  a0 = &DAT_d000bc00;
  return;
}

Address calculations are correct (-0x2ffad20 is the same as 0xd000bc00 - 0x6920 = 0xD00052E0), but...

perhaps there is some way to indicate that result in address registers should be only unsigned ?

Another example:

                             void __stdcall FUN_8006f9be(void)
                               assume a0 = 0xd000bc00
             void              <VOID>         <RETURN>
                             FUN_8006f9be
        8006f9be 00 00           nop
        8006f9c0 ed 87 16 1e     calla      FUN_800e3c2c
        8006f9c4 df 12 07 00     jeq        d2,#0x1,LAB_8006f9d2
        8006f9c8 d9 0f 0a dc     lea        a15,[a0]-0x3cb6
        8006f9cc 0c f0           ld.bu      d15,[a15]#0x0=>DAT_d0007f4a                      = ??
        8006f9ce c2 1f           add        d15,#0x1
        8006f9d0 28 0f           st.b       [a15]#0x0=>DAT_d0007f4a,d15                      = ??
                             LAB_8006f9d2                                    XREF[1]:     8006f9c4(j)  
        8006f9d2 00 90           ret
void FUN_8006f9be(void)
{
  int iVar1;

  a0 = &DAT_d000bc00;
  iVar1 = FUN_800e3c2c();
  if (iVar1 != 1) {
    (&DAT_ffffc34a)[(int)a0] = (&DAT_ffffc34a)[(int)a0] + 1;
  }
  return;
}

Here disasm works correct, but decompiler doesn't understand this construction...

mumbel commented 5 years ago

I've been finding bugs and just generic issues in the SLEIGH (also adjusting SLEIGH for improved pcode) with help from a ghidra dev. The sha, extr, and few others (plus their variants) have been problematic translating into SLEIGH with my initial attempts. My current local changes have this as the output:

void FUN_8006f8f4(void)
{
  uint uVar1;
  uint uVar2;

  uVar2 = 0;
  do {
    if (true) {
      uVar1 = uVar2 << 2;
    }
    else {
      uVar1 = uVar2;
      if (true) {
        uVar1 = 0;
      }
    }
    uVar2 = uVar2 + 1 & 0xff;
    *(undefined4 *)(&UNK_d00052e0 + uVar1) = 0;
    *(undefined4 *)(&UNK_d0005320 + uVar1) = 0;
  } while (uVar2 < 0x10);
  a0 = &DAT_d000bc00;
  return;
}
void FUN_8006f9be(void)
{
  int iVar1;

  a0 = &DAT_d000bc00;
  iVar1 = FUN_800e3c2c();
  if (iVar1 != 1) {
    *(char *)(a0 + -0x1e5b) = *(char *)(a0 + -0x1e5b) + '\x01';
  }
  return;
}
mumbel commented 5 years ago

https://github.com/mumbel/ghidra/commit/29f267b02b3e17e15647eebcff7f76de4063e2c2 seems like a noticeable improvement.

{
  int iVar1;
  uint uVar2;

  uVar2 = 0;
  do {
    iVar1 = uVar2 * 4;
    uVar2 = uVar2 + 1 & 0xff;
    *(undefined4 *)(&UNK_d00052e0 + iVar1) = 0;
    *(undefined4 *)(&UNK_d0005320 + iVar1) = 0;
  } while (uVar2 < 0x10);
  a0 = &DAT_d000bc00;
  return;
}
esaulenka commented 5 years ago

Thanks, now decompiled text looks much better!

But unfortunately, problem remains: in FUN_8006f9be, FUN_8006fab2, and many others functions decompiler consider, that a0 is variable.