lifting-bits / mcsema

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
https://www.trailofbits.com/expertise/mcsema
GNU Affero General Public License v3.0
2.66k stars 342 forks source link

Generated an invalid (zero-sized) CFG ERROR / Could not find entrypoint #648

Open ghost opened 4 years ago

ghost commented 4 years ago

Hi there,

My goal is to get the LLVM IR of a Java application using McSema.

I was going to lift a binary generated by GraalVM's native-image from HelloWorld.java, and then compressed by UPX.

command:

 mcsema-disass --disassembler ~/idaedu-7.4/idat64 --os linux --arch amd64 --output hello.cfg --binary hello --entrypoint main --log_file hello.log

output:

Generated an invalid (zero-sized) CFG. Please use the --log_file option to see an error log.

hello.log:

Debugging is enabled.
Loading Standard Definitions file: /usr/local/lib/python2.7/dist-packages/mcsema_disass-2.0-py2.7.egg/mcsema_disass/defs/linux.txt
Using Batch mode.
Starting analysis
Recovering module hello
Looking for thunks
Looking for external symbols
Looking for entrypoints
ERROR: Could not find entrypoint main
COULD NOT RECOVER ANY FUNCTIONS
Saving to: /home/vagrant/hello.cfg
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mcsema_disass-2.0-py2.7.egg/mcsema_disass/ida7/get_cfg.py", line 1650, in <module>
    args.output.write(M.SerializeToString())
AttributeError: 'NoneType' object has no attribute 'SerializeToString'

Done analysis!

How can I find the entry point of applications built by this specific method (combining Graal and UPX)? Should I ask the developers of Graal/UPX?

I attached the native binary generated by Graal, after and before the compression. hello.zip

Thanks

pgoodman commented 4 years ago

That is a pretty unusual error. It looks like you're using IDA 7.4, can you let us know what version of Python it is using?

pgoodman commented 4 years ago

Seems possible that there is an exception thrown, e.g. in instruction_personality, then through some bad logic, we still move forward with things but M is None.

Can you try replacing the instruction_personality function in ida7/util.py with this?

def instruction_personality(arg):
  global PERSONALITIES, PERSONALITY_NORMAL
  if isinstance(arg, (int, long)):
    arg, _ = decode_instruction(arg)
  if arg:
    p = PERSONALITIES.get(arg.itype, PERSONALITY_NORMAL)
    return fixup_personality(arg, p)
  else:
    return PERSONALITY_NORMAL
pgoodman commented 4 years ago

Also, it looks like hello.original is a PIE executable. I recommend adding --pie-mode for that, as it changes some of the heuristics.

ghost commented 4 years ago

That is a pretty unusual error. It looks like you're using IDA 7.4, can you let us know what version of Python it is using?

I guess I chose python2.7 when I installed IDA Pro. Also according to the hello.log, it seems it's using python2.7. Is there any better way to check it? because I don't know. When do cd remill-build && sudo make install, I get:

...
Looking for the Python interpreter
 i Python 2.7 found: python2.7

Installing mcsema-disass
 i site-packages: /usr/local/lib/python2.7/site-packages
 i Successfully installed
-- Up-to-date: /usr/local/lib/libmcsema_rt32-4.0.a
-- Up-to-date: /usr/local/lib/libmcsema_rt64-4.0.a
-- Up-to-date: /usr/local/bin/remill-lift-4.0

I tried to change the remill/tools/mcsema/tools/mcsema_disass/ida7/util.py file, but get the same error:

Debugging is enabled.
Loading Standard Definitions file: /usr/local/lib/python2.7/dist-packages/mcsema_disass-2.0-py2.7.egg/mcsema_disass/defs/linux.txt
Using Batch mode.
Starting analysis
Recovering module hello
Looking for thunks
Looking for external symbols
Looking for entrypoints
ERROR: Could not find entrypoint main
COULD NOT RECOVER ANY FUNCTIONS
Saving to: /home/vagrant/hello.cfg
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mcsema_disass-2.0-py2.7.egg/mcsema_disass/ida7/get_cfg.py", line 1650, in <module>
    args.output.write(M.SerializeToString())
AttributeError: 'NoneType' object has no attribute 'SerializeToString'

Done analysis!

After that, I changed remill/tools/mcsema/tools/mcsema_disass/ida/util.py(!), no success!

I did this:

def instruction_personality(arg):
  global PERSONALITIES, PERSONALITY_NORMAL
  print("modified instruction_personality in ida7/util.py")
  if isinstance(arg, (int, long)):
    arg, _ = decode_instruction(arg)
  if arg:
    p = PERSONALITIES.get(arg.itype, PERSONALITY_NORMAL)
    return fixup_personality(arg, p)
  else:
    return PERSONALITY_NORMAL

def _instruction_personality(arg):
  global PERSONALITIES
  if isinstance(arg, (int, long)):
    arg, _ = decode_instruction(arg)
  try:
    p = PERSONALITIES[arg.itype]
  except AttributeError:
    p = PERSONALITY_NORMAL

  return fixup_personality(arg, p)

Also, sorry if I ask (this dumb question!) because I don't know, should I build McSema again after modifying that python file?! because I didn't! (at first, I didn't, but then I did build it, and no success/difference)

Also, it looks like hello.original is a PIE executable. I recommend adding --pie-mode for that, as it changes some of the heuristics.

I can't lift that file, because I'm using an educational version of IDA Pro, which has a Max-1MB-input-file limitation, and the original file is more than 1MB (that's why I'm using UPX!)

artemdinaburg commented 4 years ago

Hi,

I am not sure if mcsema will accurately handle a UPX packed file, I have never tried it but I assume there will be quite a few failures, at least when it comes to running the file. McSema is not designed to work on packed of obfuscated code.

As for the actual error you are seeing: i am not sure quite, how well does IDA handle the file by itself? Also, you probably do need to rebuild after modifying the python file only to ensure it installs to whatever the CFG recovery uses in its path.

Have you considered the dyninst cfg recovery frontend?

ghost commented 4 years ago

Hello,

Thank you for your response.

Have you considered the dyninst cfg recovery frontend?

No, I didn't. I will try that (without using UPX).