wmkhoo / taintgrind

A taint-tracking plugin for the Valgrind memory checking tool
GNU General Public License v2.0
247 stars 42 forks source link

--taint-all does not taint stdin #39

Closed vanhauser-thc closed 4 years ago

vanhauser-thc commented 4 years ago

This example source reads from stdin:

#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
  char buf[1024];
  unsigned short int *usi;
  unsigned int *ui;
  ssize_t i;

  if ((i = read(0, buf, sizeof(buf) - 1)) < 10)
    return 0;
  buf[i] = 0;

  if (buf[0] != 'A')
    return 0;
  fprintf(stderr, "Solved 1\n");
  usi = (unsigned short int*)(buf + 1);
  if (*usi != 0x4342)
    return 0;
  fprintf(stderr, "Solved 2-3\n");
  if (memcmp(buf + 3, "DEF", 3))
    return 0;
  fprintf(stderr, "Solved 4-6\n");
  ui = (unsigned int*)(buf + 6);
  if (*ui != 0x4a494847)
    return 0;
  fprintf(stderr, "Solved 7-10\n");
  return 1;

Compiling and running with the docker container:

# docker run -ti -v /tmp:/pwd taintgrind --taint-all=yes --input-fd=0 /pwd/cmp
/code/valgrind/build/bin/valgrind --tool=taintgrind --taint-all=yes --input-fd=0 /pwd/cmp
==8== Taintgrind, the taint analysis tool
==8== Copyright (C) 2010-2018, and GNU GPL'd, by Wei Ming Khoo.
==8== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==8== Command: /pwd/tests/src/cmp
==8== 
ABCDEFGHIJKxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    # <- this is my input from stdin
Solved 1
Solved 2-3
Solved 4-6
Solved 7-10
==8== 

I also tried without the --taint-all=yes option but still no tainted instruction output. Am I using it wrong? I don't think so though.

wmkhoo commented 4 years ago

The --input-fd is actually a Valgrind option. From the Valgrind documentation:

--input-fd=<number> [default: 0, stdin]

When using --gen-suppressions=yes, Valgrind will stop so as to read keyboard input from you when each error occurs. 
By default it reads from the standard input (stdin), which is problematic for programs which close stdin. 
This option allows you to specify an alternative file descriptor from which to read input.

But let me fix --taint-all so that it sets --taint-stdin, --taint-network as well.

vanhauser-thc commented 4 years ago

sorry, I totally forgot about the existence of --taint-stdin - because it is not in the help output. adding this one and --taint-network (and others that are maybe experimental) should be in the help output :) with just --tain-stdin=yes it works fine. thanks!

vanhauser-thc commented 4 years ago

I will hijack my issue with another question :) I have never worked with valgrind except for a limited patching of taintrind. Is it possible to re-run the taintgrind with different inputs and keep the vex (that is the IR name, correct?) cached? I have a problem where I need to run a taint every few seconds on the same program however with different inputs (e.g. from stdin or from a file) and running taintgrind itself takes a long time to load up because of the IR lifting. If that is possible, where would I need to put the loop in taintgrind or valgrind? Thanks!

wmkhoo commented 4 years ago

Good question. As far as I know, there's no option to keep the vex after each run.

I suspect that the long load up time is due to parsing of symbol information for pretty-printing the information flow. If that is not needed, I believe some cpu cycles can be saved. Are you using tg/vg for analysis of fuzzing test cases?

One possibility is to write a harness that, in an infinite loop, loads the new file, then calls your program. Will that work?

vanhauser-thc commented 4 years ago

I want the taint tracking to be a standard feature that does not need user intervention. If the fuzz case comes via stdin or which exact filename - I have that information from the user who sets up the fuzzer and can pass that information to the taint engine. But I want it to be zero overhead for the user. so a harness would not work. so basically my choices boil down to vex or qemu. with qemu I would need to implement the taint stuff, for taintgrind I would need to implement a loop ... more research needed for me. whelp. thanks!

wmkhoo commented 4 years ago

Keep me posted on your further thoughts on this problem. I'm interested to see how tg can better support fuzzing.

vanhauser-thc commented 4 years ago

I I found out that the hook for re-forking runs would need to be in coregrind/m_main.c, just before VG_(init_Threads)();.

simple test code there:

   while(1 == 1) {
     char buf[4];
     VG_(printf)("Press ENTER to run again ...\n");
     int len = VG_(read)(0, buf, 4);
     if (len <= 0)
       VG_(exit)(0);
     int pid = VG_(fork)();
     VG_(debugLog)(1, "main", "fork\n");
     if (pid == 0)
       break;
     else if (pid > 0)
       waitpid(pid, (int*)buf, 0);
   }

and then tried it against a standard fuzz target (libtiff). The runtime is 3-4 seconds which is way too long :(

Then I tried a trick that the test targets jumps to the beginning of main at the end so I can see if would get faster through cashing, but could not notice a real difference.

So I will implement the taint engine in qemu (runtime with taint will be < 100ms in comparison). As I am only interested in first level taint (what bytes from read()s etc. are accessed, not what happens afterwards), that is limited effort I have to invest.

wmkhoo commented 4 years ago

Without a lot of the bells and whistles, sub-second per run should be possible. Removing the call to VG_(needs_var_info), I already see significant run-time reduction.

I implemented so-called first-level taint as the --head=yes option, and yes, it does help with performance.

FWIW, I've forked tg to experiment. https://github.com/wmkhoo/taintfuzz

vanhauser-thc commented 4 years ago

Oh yes, this is 8x faster, ~ 500ms

For me it is missing -fpermissive -Wl,--allow-multiple-definition to be able to compile

It still taints more than needed I think:

0xFFFFFFFF | Read:832 | 0x0 | 1ffeffe188_unknownobj
0x4007126 | Load:8 | 0x10102464c457f | t2_4802 <- 1ffeffe188_unknownobj       <--- yup this is what is needed
0x4007126 | r19_12373 <- t2_4802                                                          <-- this is not
0x4007131 | Load:1 | 0x0 | t8_2770 <- 1ffeffe190_unknownobj                       <-- yup needed
0x4007131 | t17_4747 <- t8_2770                                                          <-- these and below not
0x4007131 | t7_4906 <- t17_4747
0x4007131 | t18_6063 <- t7_4906
0x4007131 | t6_8576 <- t18_6063
0x4007131 | r4_2406 <- t6_8576
0x4007131 | t3_7059 <- r4_2406
0x4007131 | t19_2366 <- t3_7059
0x4007131 | t9_4876 <- t19_2366
0x4007131 | r19_12374 <- t9_4876
0x4007139 | t23_4626 <- t9_4876
0x4007139 | t22_2949 <- t23_4626

btw this is with: ./taintfuzz --file-filter=/tests/tiff-4.0.4/test/palette-1c-8b.tiff --head=yes -- /tests/tiff-4.0.4/tools/thumbnail /tests/tiff-4.0.4/test/palette-1c-8b.tiff /dev/null

vanhauser-thc commented 4 years ago

I now have a qemu implementation ready that is hopefully en-par with taintgrind/tainfuzz (well just for first level taint, but that is all I need) ... it is 10x faster (30ms execution) so I will stay with that solution

wmkhoo commented 4 years ago

Ok. Post a link to it somewhere.

On Sat, 8 Aug 2020 at 08:13, van Hauser notifications@github.com wrote:

I now have a qemu implementation ready that is hopefully en-par with taintgrind/tainfuzz (well just for first level taint, but that is all I need) ... it is 10x faster (30ms execution) so I will stay with that solution

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wmkhoo/taintgrind/issues/39#issuecomment-670792424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM4GS6YGE4UJAPHJPHYINLR7SKD5ANCNFSM4PTKNH4Q .

vanhauser-thc commented 4 years ago

The taint part is here: https://github.com/vanhauser-thc/qemu_taint

time ./afl-qemu-taint /prg/tests/qemu/tiff-4.0.4/tools/tiffinfo /prg/tests/qemu/tiff-4.0.4/test/images/palette-1c-8b.tiff
...
real    0m0,018s

with full debug active it is 35ms.

not sure everything is correctly found yet ;) WIP ..