wmkhoo / taintgrind

A taint-tracking plugin for the Valgrind memory checking tool
GNU General Public License v2.0
247 stars 42 forks source link

Print the taintedness (tainted/untainted) of each variable #37

Closed PriyankaPanigrahi closed 4 years ago

PriyankaPanigrahi commented 5 years ago

Let in a program a variable 'x' is tainted. There is an assignment 'y=x' where y is untainted. How to check the taintflow in the output or data flow graph ?

Any suggestions?

Thank you. Have a great day.

wmkhoo commented 5 years ago

Say you have this program, x.c: #include "taintgrind.h" int main(int argc, char **argv) { int x=10, y; TNT_TAINT(&x, sizeof(x)); y=x; return y; } Compile it with: taintgrind$ gcc -g -O0 -o tests/x -I. -I../include tests/x.c

Run taintgrind with: taintgrind$ ../build/bin/valgrind --tool=taintgrind tests/x 10 ==4507== Taintgrind, the taint analysis tool ... ==4507== Command: tests/x 10 0x10895A: main (x.c:8) | mov eax, dword ptr [rbp - 0x50] | Load | 0xa | t20_8260 <- x:1ffefffd10 ... 0x10895D: main (x.c:8) | mov dword ptr [rbp - 0x4c], eax | Store | 0xa | y:1ffefffd14 <- t23_9256

You should get these two lines in the output, and the following taintgraph: taint_x_to_y

PriyankaPanigrahi commented 5 years ago

Thank you so much for your reply.

I have already done this much. But, if we have a large number of variables in a source code, let 100 and 20 are tainted variables. Its difficult to check the taint flow manually in the taint graph.

Is it possible to print the taintness (tainted/untainted) of each variable at the end of program ?

Any advice will be helpful. Thank you for your time. Have a great day.

wmkhoo commented 5 years ago

I've just added a new client request: TNT_IS_TAINTED. This allows you to read the taint bits of a variable that you specify at run-time. Have a look at the test case tests/checktaint.c to see if this is useful for you.

If you run this test case, you should get: a is_tainted: ffffffff b is_tainted: 00000000 c[0] is_tainted: 00000000 c[1] is_tainted: 00000000 c[2] is_tainted: 00000000 c[3] is_tainted: 00000000 c[4] is_tainted: 00000000 c[5] is_tainted: ffffffff c[6] is_tainted: 00000000 c[7] is_tainted: ffffffff c[8] is_tainted: 00000000 c[9] is_tainted: 00000000

PriyankaPanigrahi commented 5 years ago

How does it allow us to read the taint bits of a variable to specify at run-time?

wmkhoo commented 5 years ago

If you look at tests/checktaint.c, the lines that print the taint bits are: TNT_IS_TAINTED(t, &a, sizeof(a)); printf("a is_tainted: %08x\n", t);

TNT_IS_TAINTED() takes 3 arguments: the output variable (unsigned int), an address, and the number of bytes to read. The taint bits will be written to t, which you can then print out.

wmkhoo commented 5 years ago

From what you're saying, the taintgrind output, which is written to stderr, is getting in the way. One way is to pipe stderr to /dev/null; another option is to save stdout to a file, e.g. > output.txt.

Hope that helps.

On Thu, Sep 19, 2019 at 10:14 PM PriyankaPanigrahi notifications@github.com wrote:

I made all the changes as you suggested.

I am getting output for various functions, such as _itoa_word, vfprintf, _IO_file_xsputn@@GLIBC_2.2.5, __memcpy_avx_unaligned_erms and many more.

But not getting the desired output. I am not able to understand where I am wrong.

Please suggest.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wmkhoo/taintgrind/issues/37?email_source=notifications&email_token=AAM4GSYHZTBGLMMY3P45VQLQKOCKXA5CNFSM4IV7HPZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DTTFA#issuecomment-533150100, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM4GS7NEO6QGUI7J2FUUTTQKOCKXANCNFSM4IV7HPZQ .

PriyankaPanigrahi commented 5 years ago

I've just added a new client request: TNT_IS_TAINTED. This allows you to read the taint bits of a variable that you specify at run-time. Have a look at the test case tests/checktaint.c to see if this is useful for you.

If you run this test case, you should get: a is_tainted: ffffffff b is_tainted: 00000000 c[0] is_tainted: 00000000 c[1] is_tainted: 00000000 c[2] is_tainted: 00000000 c[3] is_tainted: 00000000 c[4] is_tainted: 00000000 c[5] is_tainted: ffffffff c[6] is_tainted: 00000000 c[7] is_tainted: ffffffff c[8] is_tainted: 00000000 c[9] is_tainted: 00000000

Thank you so much for your help.

I am getting the same output as you mentioned. Does it mean: if we get. ffffffff, it means tainted and for 00000000 means untainted? What is the significance of these output values?

wmkhoo commented 5 years ago

Not just that. ffffffff means all 32 bits are tainted.

PriyankaPanigrahi commented 5 years ago

Thank you very much for your reply.

wmkhoo commented 5 years ago

Just to add that the number of bytes that TNT_IS_TAINTED can accept are: 1, 2, 4 and 8.

PriyankaPanigrahi commented 5 years ago

So, the last argument of TNT_IS_TAINTED(), can be 1, 2, 4 or 8.

On Sat, 21 Sep 2019, 6:45 am Wei Ming Khoo, notifications@github.com wrote:

Just to add that the number of bytes that TNT_IS_TAINTED can accept are: 1, 2, 4 and 8.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wmkhoo/taintgrind/issues/37?email_source=notifications&email_token=AM6LWNU7QSZJSGLMRBVITTLQKVYTFA5CNFSM4IV7HPZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IHOEQ#issuecomment-533755666, or mute the thread https://github.com/notifications/unsubscribe-auth/AM6LWNXWZZ5JS6Z3GDLOKQDQKVYTFANCNFSM4IV7HPZQ .

PriyankaPanigrahi commented 5 years ago

For source code:

int main(int argc, char **argv) { int a = 1000, b; unsigned int t; // Defines int a as tainted TNT_TAINT(&a,sizeof(a));

if (a==1000)
    b=1;
else
    b=0;

TNT_IS_TAINTED(t, &a, sizeof(a));
printf("a is_tainted: %x\n", t);

TNT_IS_TAINTED(t, &b, sizeof(b));
printf("b is_tainted: %x\n", t);

return 0;

}

variable b should be tainted or not, as the value of untainted variable "b" depends on the value of tainted variable "a"?

wmkhoo commented 5 years ago

Let's take a similar example (http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf Fig. 3):

    char output[256];
    long input = user_input();
    long len = 0;
    if (input > 100) {
        strcpy(output, "large");
        len = 5;
    } else {
        strcpy(output, "small");
        len = 5;
    }
    print_output(output, len);

In this case, is len dependent on input?

PriyankaPanigrahi commented 5 years ago

Whatever the value of input, len will print 5 only. So, it is not dependent.

On Sat, 21 Sep 2019, 5:40 pm Wei Ming Khoo, notifications@github.com wrote:

Let's take a similar example ( http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf Fig. 3):

char output[256];
long input = user_input();
long len = 0;
if (input > 100) {
    strcpy(output, "large");
    len = 5;
} else {
    strcpy(output, "small");
    len = 5;
}
print_output(output, len);

In this case, is len dependent on input?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wmkhoo/taintgrind/issues/37?email_source=notifications&email_token=AM6LWNV72ZALQOXCVPJPKMLQKYFKVA5CNFSM4IV7HPZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IQSZQ#issuecomment-533793126, or mute the thread https://github.com/notifications/unsubscribe-auth/AM6LWNWNQPTYJXKJV3QDKFLQKYFKVANCNFSM4IV7HPZQ .

wmkhoo commented 5 years ago

There are at least three types of taint dependency. Let x be tainted.

  1. Direct/Data-flow dependence, e.g. y = x
  2. Indirect/Control-flow dependence, e.g. if(x){ y = 2; }
  3. Address/Pointer dependence, e.g. y = a[x] and y = *x

Taintgrind, which follows Valgrind memcheck, only implements 1, not 2 or 3. This means it will under-taint, i.e. it will miss some dependencies. On the other hand, it is tricky to handle 2 and 3, as it may lead to over-tainting, i.e. reporting dependencies where there is none. For more info on taint analysis, check out https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf.

PriyankaPanigrahi commented 5 years ago

Thank you for your reply.

As taintgrind is under-taint, is there any other tool, which can address all the 1, 2, and 3, or there is no other tool because of over-tainting.

wmkhoo commented 5 years ago

Let me add that although under-tainting will miss some dependencies, it can still be useful.

Some other dynamic taint analysis tools I'm aware of, but have not tried (and the info may not be up-to-date):

  1. libdft: According to their paper, "in this work, we do not consider cases of implicit data flow that are in accordance with previous work on the subject". (http://nsl.cs.columbia.edu/papers/2012/libdft.vee12.pdf, Pg. 2)
  2. triton: "Dynamic Taint Analysis. (DTA) aims to detect which data and instructions along an execution depend on user input. We consider direct tainting." (https://triton.quarkslab.com/files/DIMVA2018-deobfuscation-salwan-bardin-potet.pdf, Pg. 6)

If you want to experiment with and implement different taint rules, I hear that bap will let you do that (but again, I have not tried it).

PriyankaPanigrahi commented 5 years ago

Thank you very much for your valuable reply. It's a really great help.