radareorg / ideas

4 stars 1 forks source link

How about asm to C translator? #264

Open j123123 opened 7 years ago

j123123 commented 7 years ago

For example here is byte sequence 48 B8 01 48 31 C0 48 8D 04 18 EB F7 and we need to make some C code from it (to port into ARM arch for example) so it is possible to make something like this:

label0x00:
  {
    rax = 0x18048d48c0314801;
    goto label0x0a;
  }
label0x01:
  {
    eax = 0xc0314801;
    goto label0x06;
  }
label0x02:
  {
    *(uint32_t *)(rax + 0x31) += ecx;
    goto label0x05;
  }
label0x03:
  {
    rax = rax ^ rax;
    goto label0x06;
  }
label0x04:
  {
    eax = eax ^ eax;
    goto label0x06;
  }
//etc...

see also: https://github.com/frranck/asm2c

j123123 commented 7 years ago

this extension https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html can be used

radare commented 7 years ago

Sounds like a good idea but i would base this as a transformation of esil

On 29 May 2017, at 09:07, szt notifications@github.com wrote:

this extension https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html can be used

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

j123123 commented 7 years ago

https://raw.githubusercontent.com/Javanaise/mrboom-libretro/master/mrboom.c look how this translated source look like (huge file)

j123123 commented 7 years ago

Hmm, maybe esil to GCC's GIMPLE or GENERIC internal representation https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html https://gcc.gnu.org/onlinedocs/gccint/GENERIC.html (add esil frontend to GCC)?

or LLVM?

XVilka commented 7 years ago

@j123123 check radeco project https://github.com/radare/radeco-lib

j123123 commented 7 years ago

@XVilka Decompilers generally not designed to produce compilable source, it generally designed to produce human-readable source. If need to port binary from one ISA to another, no need to make translated source human-readable.

And compilers can use dirty tricks, which cannot be decompiled in easy way. For example, when you have two very similar functions and compiler decides separate a common piece of assembly and make jump to it (IAR compiler can do such things).

And I don't like Rust memory model, (just try to implement double linked list in Rust https://www.reddit.com/r/rust/comments/2u53le/this_is_a_doubly_linked_list_in_safe_rust/ ). As for me, it's bad approach to make things safe and fast. Just need to prove everything using Frama-C like stuff, SMT solvers. And disallow to compile unproved code (unless you use "unsafe" keyword or add runtime bounds checking to every unproved place)

btw https://www.reddit.com/r/linux/comments/200jd0/super_genius_notaz_ports_starcraft_to_armwine/ https://github.com/notaz/ia32rtools

XVilka commented 7 years ago

You're speaking about binary reassembly, and it's even more challenging task rather than writing a decompiler. Moreover it's a rare need, unlike decompiler, so judging from the complexity/demand ratio, decompiler is higher prio. Anyway full-featured data flow analysis is required for complete binary reassembly, so it's kind of including decompilation task.

radare commented 7 years ago

I dont think what he proposes is harder than a decompilation. And ive did that by hand in the past when copypasting blocks of disasm into c doesnt works because of arch or memory layout restrictions, etc

I think thats pretty useful and should be easy to do as a transformation of esil. Like it was done fot reil (generated from esil)

On 1 Jun 2017, at 06:06, Anton Kochkov notifications@github.com wrote:

You're speaking about binary reassembly, and it's even more challenging task rather than writing a decompiler. Moreover it's a rare need, unlike decompiler, so judging from the complexity/demand ratio, decompiler is higher prio. Anyway full-featured data flow analysis is required for complete binary reassembly, so it's kind of including decompilation task.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

XVilka commented 5 years ago

Can this be closed in favor of https://github.com/radareorg/radeco, https://github.com/wargio/r2dec, and others?

j123123 commented 5 years ago

Well, if it can produce correct and compileable code - then yes. I mean, how about case when jump in the middle of instruction is performed (like in example on image)? Does it work?

j123123 commented 5 years ago

What I propose is not like radeco or r2dec https://github.com/wargio/r2dec-js#r2dec-pseudo-c-code - here I see some while() loop - this is not this case. My idea: every instruction must be converted in code chunk with gotos. Every code chunk must change some global variables, which emulating registers. For example CMP instruction changes some flag variables, and JNE instruction read that variables and doing jump to somewhere and jump to the next instruction (code chunk) otherwise.

For example: '4839d875fe'

$ rasm2 -d -b 64 '4839d875fe'
cmp rax, rbx
jne 3
$ rasm2 -d -b 64 '39d875fe'
cmp eax, ebx
jne 2
rasm2 -d -b 64 'd875fe'
fdiv dword [rbp - 2]
$ rasm2 -d -b 64 '75fe'
jne 0

translate to

label0x00:
{  // cmp rax, rbx : "4839d8"
  //  set CF, OF, SF, ZF, AF, and PF flags according to the result. 
  goto label0x03;
}

label0x01:
{ // cmp eax, ebx : "39d8"
  //  set CF, OF, SF, ZF, AF, and PF flags according to the result. 
  goto label0x03;
}

label 0x02:
{ // fdiv dword [rbp - 2] : "d875fe"
  // some C code which doing manipulation with array
  // (array emulating FPU stack) and fetch data from rbp - 2
  // and do division
  float tmp;
  memcpy(*tmp, (void *)((uintptr_t)rbp-2), sizeof(float));
  // and do check if 0 division, and jump to exception handler stuff
  // actually, need to check control register stuff https://wiki.osdev.org/FPU#FPU_control 
  // etc, etc
  ....
}

label0x03:
{ // jne 3 : "75fe"
  if (eflags.zf == 0)
  {
    goto label0x03;
  }
  goto label 0x05;
}
...
radare commented 5 years ago

R2 code analysis handles this jump in the middle thing yes

On 18 Nov 2018, at 01:47, j123123 notifications@github.com wrote:

Well, if it can produce correct and compileable code - then yes. I mean, how about case when jump in the middle of instruction is performed (like in example)? Does it work?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

j123123 commented 5 years ago

@radare it's nice but my idea is very different from radeco or re2c and even Hex-Rays from IDA This things created to understand what actually happening, generating some pseudo-C output (which has no purpose to be to 100% correct and compileable). Hex-Rays for example trying to detect local variables on stack, it have some stuff inside to detect calling conversion, how arguments transferred and who clean stack (calling or called function) etc, etc. If you simply convert every instruction (call, ret) in chunk of code which jump to pointer and push return address to stack (call) or pop pointer from stack and jump in it (ret) you don't have to care about calling conversion, function prologue/epilogue detection and other stuff like that.

radare commented 5 years ago

This can be done with esil. Replace all the operations to add your callbacks and add callbacks when rega are accessed etc, then demangle the expresion into C like code

This is done by the reil conversion command aetr

On 18 Nov 2018, at 18:01, j123123 notifications@github.com wrote:

@radare it's nice but my idea is very different from radeco or re2c and even Hex-Rays from IDA This things created to understand what actually happening, generating some pseudo-C output (which has no purpose to be to 100% correct and compileable). Hex-Rays for example trying to detect local variables on stack, it have some stuff inside to detect calling conversion, how arguments transferred and who clean stack (calling or called function) etc, etc. If you simply convert every instruction (call, ret) in chunk of code which jump to pointer and push return address to stack (call) or pop pointer from stack and jump in it (ret) you don't have to care about calling conversion, function prologue/epilogue detection and other stuff like that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

j123123 commented 5 years ago

Great, but I don't see asm-to-esil implementation for x86 FPU https://github.com/radare/radare2/blob/dd84bfe3dee230feb542908870b3a731481eae63/libr/anal/p/anal_x86_cs.c#L432-L438 When it will be available? Any plans for implement it?

j123123 commented 5 years ago

And what about REIL (OpenREIL)? Maybe it's better to do OpenREIL -> C instead ESIL -> C. Need to think about how better implement this

XVilka commented 5 years ago

No practical not abandoned tool uses REIL in 2018, this language is old. If you want something more high-level you can use RadecoIL instead.

On Tue, Nov 20, 2018, 11:09 AM j123123 <notifications@github.com wrote:

And what about REIL (OpenREIL)? Maybe it's better to do OpenREIL -> C instead ESIL -> C. Need to think about how better implement this

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7617#issuecomment-440121746, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMZ_Rq3pY9HVr83TSGlzIXEnxq21EI6ks5uw3KAgaJpZM4No9uq .

radare commented 5 years ago

reil is pretty limited, bad designed and abandoned

On 20 Nov 2018, at 04:09, j123123 notifications@github.com wrote:

And what about REIL (OpenREIL)? Maybe it's better to do OpenREIL -> C instead ESIL -> C. Need to think about how better implement this

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7617#issuecomment-440121746, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lmI3biDmgG8fba2jUmn_WTYXpikNks5uw3KBgaJpZM4No9uq.

radare commented 5 years ago

This is not that bad idea because it will make esil expressions more "readable" which is a common complain from some people, and also give us the ability to have JIT in ESIL like Ruby does.

radare commented 5 years ago

cc @condret