panda-re / panda

Platform for Architecture-Neutral Dynamic Analysis
https://panda.re
Other
2.45k stars 475 forks source link

Examples on Taint Analysis do not work or out of date #1392

Open jstarink opened 8 months ago

jstarink commented 8 months ago

Description

It seems some of the Python examples are not working (anymore). In particular, I am looking into the implementation of taint.py. Running this script as-is does not work / segfaults.

Findings

Based on the errors I received in the stdout, I identified the following problems:

  1. The string comparison on line 25 should be a bytes comparison or it will never taint anything as fd_to_fname returns bytes.
  2. The call to panda.taint_label_ram requires a label argument.
  3. taint2 should be enabled/loaded before panda.run is called.

Points 1 and 2 are easy enough to address, but with 3 I have some trouble. When I try add the following lines before the call to panda.run, PANDA seems to be crashing with a segfault upon calling the panda.taint_enable.

panda.load_plugin("taint2")
panda.taint_enable()

Example output:

root@5a4e2fa4f486:/local# python taint.py
using generic x86_64
os_name=[linux-64-ubuntu:4.15.0-72-generic-noaslr-nokaslr]
PANDA[core]:os_familyno=2 bits=64 os_details=ubuntu:4.15.0-72-generic-noaslr-nokaslr
[PYPANDA] Panda args: [/usr/local/lib/python3.8/dist-packages/pandare/data/x86_64-softmmu/libpanda-x86_64.so -L /usr/local/share/panda /root/.panda/bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2 -display none -m 1024 -serial unix:/tmp/pypanda_s2hy2gyig,server,nowait -monitor unix:/tmp/pypanda_m90o4s3uq,server,nowait]
PANDA[osi_linux]:W> kernelinfo bytes [20-23] not read
PANDA[syscalls2]:using profile for linux x64 64-bit
PPP automatically loaded plugin syscalls2
PPP automatically loaded plugin taint2
PANDA[taint2]:propagation via pointer dereference ENABLED
PANDA[taint2]:taint operations inlining DISABLED
PANDA[taint2]:llvm optimizations DISABLED
PANDA[taint2]:taint debugging DISABLED
PANDA[taint2]:detaint if control bits 0 DISABLED
PANDA[taint2]:maximum taint compute number (0=unlimited) 0
PANDA[taint2]:maximum taintset cardinality (0=unlimited) 0
callstack_instr:  setting up threaded stack_type
PANDA[taint2]:taint2_enable_taint
taint2: Allocating small fast_shad (0 bytes) using malloc @ 0x29ac260.
taint2: Allocating small fast_shad (19200000 bytes) using malloc @ 0x7f9eb847a010.
taint2: Allocating small fast_shad (384 bytes) using malloc @ 0x2b9d590.
taint2: Allocating small fast_shad (3072 bytes) using malloc @ 0x2cf7dd0.
taint2: Allocating small fast_shad (1030272 bytes) using malloc @ 0x7f9ec8347010.
PANDA[taint2]:LLVM optimizations DISABLED
taint2: Initializing taint ops
taint2: Done initializing taint transformation.
Segmentation fault (core dumped)
root@5a4e2fa4f486:/local#

Stack trace according to GDB (see bottom of issue) seems to indicate it happens the PandaTaintVisitor class, called by the taint2_enable_taint function

Am I missing something?

Details

Full (modified) script: ```python from pandare import Panda panda = Panda(generic='x86_64') @panda.queue_blocking def driver(): panda.revert_sync('root') print(panda.run_serial_cmd("grep root /etc/passwd")) panda.end_analysis() panda.require("osi") panda.require("osi_linux") def fd_to_fname(cpu, fd): proc = panda.plugins['osi'].get_current_process(cpu) procname = panda.ffi.string(proc.name) if proc != panda.ffi.NULL else "error" fname_ptr = panda.plugins['osi_linux'].osi_linux_fd_to_filename(cpu, proc, fd) fname = panda.ffi.string(fname_ptr) if fname_ptr != panda.ffi.NULL else "error" return fname @panda.ppp("syscalls2", "on_sys_read_return") def read(cpu, tb, fd, buf, cnt): fname = fd_to_fname(cpu, fd) print(f"read {fname}") if fname == b"/etc/passwd": # <-- changed to bytes string for idx in range(cnt): panda.taint_label_ram(buf+idx, 1) # <-- added taint label 1 (not sure about the expected type?) @panda.ppp("taint2", "on_branch2") def something(addr, size, from_helper, tainted): print("Tainted branch") # Added plugin loading/enabling panda.load_plugin("taint2") panda.taint_enable() panda.run() ```
GDB stack trace ``` 0x00007fc244615bb8 in llvm::PandaTaintVisitor::insertStateOp(llvm::Instruction&) () from /usr/local/lib/panda/x86_64/panda_taint2.so (gdb) bt #0 0x00007fc244615bb8 in llvm::PandaTaintVisitor::insertStateOp(llvm::Instruction&) () from /usr/local/lib/panda/x86_64/panda_taint2.so #1 0x00007fc244619d35 in llvm::PandaTaintFunctionPass::runOnFunction(llvm::Function&) () from /usr/local/lib/panda/x86_64/panda_taint2.so #2 0x00007fc2446094c1 in taint2_enable_taint () from /usr/local/lib/panda/x86_64/panda_taint2.so #3 0x00007fc259622ff5 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7 #4 0x00007fc25962240a in ?? () from /lib/x86_64-linux-gnu/libffi.so.7 #5 0x00007fc2588810a7 in cdata_call (cd=, args=, kwds=) at src/c/_cffi_backend.c:3201 #6 0x00000000005f7506 in _PyObject_MakeTpCall () #7 0x0000000000570b8e in _PyEval_EvalFrameDefault () #8 0x00000000005f6ce6 in _PyFunction_Vectorcall () #9 0x000000000056b619 in _PyEval_EvalFrameDefault () #10 0x00000000005697da in _PyEval_EvalCodeWithName () #11 0x000000000068e547 in PyEval_EvalCode () #12 0x000000000067dbf1 in ?? () #13 0x000000000067dc6f in ?? () #14 0x000000000067dd11 in ?? () #15 0x000000000067fe37 in PyRun_SimpleFileExFlags () #16 0x00000000006b7c82 in Py_RunMain () #17 0x00000000006b800d in Py_BytesMain () #18 0x00007fc259d69083 in __libc_start_main (main=0x4ef140
, argc=2, argv=0x7fffa36cf438, init=, fini=, rtld_fini=, stack_end=0x7fffa36cf428) at ../csu/libc-start.c:308 #19 0x00000000005fb85e in _start () ```
jstarink commented 8 months ago

Based on taint_x86_64.py I have come to realize I should enable the taint analysis after the machine is set up, e.g., inside of a @panda.cb_after_machine_init callback. Is this the typical approach to take or is there another (better) way?