moralismercatus / crete

Open source concolic testing tool for binaries
1 stars 1 forks source link

Support non-command-line data testing #32

Closed moralismercatus closed 10 years ago

moralismercatus commented 10 years ago

Up til now, the preloader made argv to the program concolic/symbolic. Thus, only command-line input could be tested.

This method is clearly limited, so we've decided to have the user create a wrapper that either invokes a library, or if the source code is available, inserts approperiate functions to designate variables as concolic - such that test cases will be generated for these variables.

The function's working name and declaration is: void bct_make_concolic(void* addr, size_t size);

The previous command-line method will be kept, but unused, for the possibility of providing a convience for testing command-line utilities such as linux utilities in the future.

moralismercatus commented 10 years ago

My initial attempt to write the concolic data to file immediately upon receiving the custom instruction was met with segault. The prevailing notion is that that data hasn't been entirely loaded into memory at that point, and thus attempting to translate the guest virtual address to host address caused a page fault.

The workaround for this is to store the address and size of the data to be dumped and wait until the first translation block of interest is encountered to dump, then translate the guest address to host address one byte at a time. This is how rodata was done.

moralismercatus commented 10 years ago

Problem: klee_error("Trying to allocate an overlapping object")

Klee tries to allocate a memory object given the host address of the variable called with make_concolic; however, that memory region is already being used by another memory object.

In my case:

esp[8] 959500276 4, 3215836844:140286322632364 c_first 0 1, 3215836846:140286322632366

esp[8] allocates an mo from 140286322632364-140286322632368 c_first allocates an mo from 140286322632366-140286322632367

So, c_first overlaps with esp[8].

The decision to dump esp at each TB dump, I think, was done speculatively to provide stack information to Klee. A full understanding of that issue is warranted, but for the time being, skipping the esp whose address conflicts with make_concolic variables is a plausible workaround.

moralismercatus commented 10 years ago

The issue was resolved by filtering all esps whose addresses overlap with the make_concolic variables out of dump_mo_concrete.txt.

moralismercatus commented 10 years ago

The next issue is how to know the value's type in the test case. This was not a problem when only command-line input was supported because all command-line input is text. Now, when we have

char* str = "..."; uint32_t var1 = 5434...; int64_t var2 = -13323...; make_concolic(str, str_size); make_concolic(var1, sizeof(var1)); make_concolic(var2, sizeof(var2));

The test case value needs to be interpreted as a char* (text), uint64, or int64_t.

My response is that it shouldn't matter the type as long as the raw data and size is available.

The problem is that the raw data isn't available. All test case information is transmitted using ASCII text files. It seems this needs to change to binary in order to be generic.

moralismercatus commented 10 years ago

Problem: Klee only generates one test case 'C' 'C' then quits.

I've isolated the problem. To explain, let's start with the original method using argv as symbolic. For the test binary that has main() and main_impl(), if only main_impl is listed in dispatch.ini, the same problem occurrs. Klee generates a single test case.

The simple solution is to list both main and main_impl in dispatch.ini. Using the new method, the problem is that make_concolic() is also in main(), so the variables assigned to be made concolic don't get assigned until after state data has already been dumped (at the beginning of TB dumping).

A workaround for this is to dump concolic data as it is encountered (or at least getting the address then, the actual dump can happen during the ending "dump" phase." In fact, this may be the only tenable solution.

The bigger question is why do I need to list all functions between start and destination? This is likely due to the way the dump is done. Before any TB is dumped, the CPU and runtime state are dumped. From there, the sequence of instructions modify the state from the beginning of the program to the end, so information is missing if a calling function is not listed to be dumped (e.g., main() when target function is main_impl()).

The workaround did not fix the problem, though it was necessary.

I've isolated it to the inclusion of assignment statements. When using argv, if i do the following:

c_first = argv[1][0]; if(...some check on c_first to create branch)...

It works.

If I do this: c_first = argv[1][0]; c_first = '0'; ...

It doesn't work. It generates a test case 'C' for a single path and stops.

The problem, in fact, was the assignment (along with a target_ulong downcast of the host address).

Changing the code from:

char c_first = '0'; // initialize char c_second = '0'; // initialize

make_concolic(c_first, sizeof(c_first); make_concolic(c_second, sizeof(c_second); ...

To:

char c_first; char c_second;

make_concolic(c_first, sizeof(c_first); make_concolic(c_second, sizeof(c_second);

Allowed Klee to generate the required test cases. So, Klee must have seen the assignment and concluded that no branches could be taken based on this.

moralismercatus commented 10 years ago

It appears that even keeping

make_concolic(c_first, sizeof(c_first); make_concolic(c_second, sizeof(c_second);

Inside the dumped main() function does not inhibit Klee from doing its job.

Initially, I thought I'd have to move them outside to an undumped function, but I am cautiously pleased that it works. Maybe the LLVM Bitcode translator ignores these custom instruction directives.

moralismercatus commented 10 years ago

Attempts to get make_concolic() to work with arrays have been unsuccessful. Oddly, the behavior is similar to c_first being initialized above. A single path with a single test case is generated.

It's possible that the method being used to make memory objects symbolis not the best way. The objects are made symbolic, and created, before the start of execution via file read.

The way that Klee normally makes memory objects symbolic is to have a call inside the bitcode "klee_make_symbolic" that invokes a special handler inside Klee. This seems like the most consistent and least complicated way, as the user, then, shouldn't need to worry about keeping variables unitialized, for example.

The issue then, is how to translate make_concolic() calls from the guest OS to klee_make_symbolic() in the Bitcode.

moralismercatus commented 10 years ago

Arrays and integers are now supported.

The solution required two steps.

The first was to inject a call to "crete_make_symbolic" in the Bitcode at the appropriate location. This was to mimic Klee's native behavior. For an unknown reason, creating symbolic memory objects before Bitcode execution yielded incorrect results. Now, once a call in the bitcode is made, a special handler invokes Klee's native handler (for klee_make_symbolic). From here, integers worked.

The second step was to turn off stack protection when compiling the program (-fno-stack-protector). This was causing problems with arrays. Of course, this problem with stack protection will need to be addressed in the future.