lifting-bits / mcsema

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
https://www.trailofbits.com/expertise/mcsema
GNU Affero General Public License v3.0
2.65k stars 344 forks source link

Decompiling Windows binaries (32bit and 64bit) does not work at all #764

Open J1Man opened 3 years ago

J1Man commented 3 years ago

Hi Guys,

As the title says, I could not decompile any of the windows executables that I tried to bitcode using mcsema-lift-9.0. To see if I am doing anything wrong, I decided to try your Maze example by building binaries for Windows, but decompiling them did not work either. I wrote the details below.

I compiled the Maze example source code that is included in your repository to 32bit and 64bit Windows binaries using WinLibs Clang compiler on Windows 7. The compiler that I used is available at winlibs.com (release name: GCC 11.2.0 + LLVM/Clang/LLD/LLDB 12.0.1 + MinGW-w64 9.0.0 - release 1) . I attached the executables to this message as a zip file. I verified that the EXEs work well under windows.

I could not decompile them to BC bitcode files using mcsema-lift-9.0 on ubuntu linux. I got lots of OP code errors and other sorts of errors.

I am using a clean install Ubuntu 20.04.3 virtualmachine. I compiled Mcsema by following the instructions on your readme. I have IDA PRO 7.6 on my windows machine. I am able to generate CFG files from executables by using Mcsema's python scripts and IDA PRO 7.6.

Can you please take a look at the attached EXEs and see if you are able to convert them to bitcode on your end? Since none of the windows binaries that I tried work, I want to know if I am doing something wrong on my end.

On a separate note, I was able to decompile your example linux AARCH64 Maze binary and recompile it back. I generated the CFG on windows, created the BC on linux and recompiled it back as a amd64 linux executable on linux. I followed the instructions on your blog linked below. I want to mention that the recompiled amd64 linux maze binary sometimes (but not always) gave segfault errors while trying to solve the maze game. https://blog.trailofbits.com/2018/01/23/heavy-lifting-with-mcsema-2-0/

Things just don't seem to work at all for 32bit/64bit windows executables.

MazeWindowsBinaries_32bitAnd64bit.zip

anisyusof-sc commented 2 years ago

I am also facing the same issue with windows binaries. I am using LLVM 11 and run everything on Ubuntu with IDA Pro 7.6. I am able to generate CFG, but received the following error during lifting:

F1208 11:49:13.641788 42893 Segment.cpp:484] Check failed: 'seg_type' Must be non NULL 
*** Check failure stack trace: ***
    @          0x12f63ac  google::LogMessageFatal::~LogMessageFatal()
    @           0x64ee3d  google::CheckNotNull<>()
    @           0x64b0b7  mcsema::DefineDataSegments()
    @           0x63d5f1  mcsema::LiftCodeIntoModule()
    @           0x65298c  main
    @     0x7f9237cf00b3  __libc_start_main
    @           0x6011ee  _start
    @              (nil)  (unknown)

It seems there is also another issue #740 that exhibits the same behavior. I have also tried with your Maze binaries, and face the same issue.

MazeWindowsBinaries_32bitAnd64bit.zip

However, I am able to successfully lift amd64 linux binaries. It seems there are some problems related to windows binaries.

pgoodman commented 2 years ago

We previously did most of our testing on Linux binaries, and only really supported recompilation for x86-64 Linux programs. We don't have the spare cycles to test and support Windows binaries atm.

SaifRushdHadad commented 2 years ago

The README should probably be updated to reflect the current support that remill / anvil / mcsema provides for Windows binaries so this issue doesn't keep on popping up.