Investigate lag when divide-by-zero occurs

moralismercatus commented 9 years ago

Take the following code:

int main() { unsigned buf[2];

crete_make_concolic(buf, buf, sizeof(buf)); __crete_make_symbolic();

printf(...entire buffer...);

if(5 / buf[0]) return 1;

return 0; }

When test case: 0 0 is run, there is a 30-60 second delay between when the new test case is printed in the printf and when the Floating Point Exception error is displayed. What that delay is, I'm not sure, but it halts everything else.

When divide-by-zero is not present, e.g,. if('h' == buf[0]) return 1;

All test cases are run speedily.

Note: the OS does a core dump when a divide-by-zero occurs, so this is going to add some extra time during concrete execution.

moralismercatus commented 9 years ago

Emphesis on the former point: without divide-by-zero present, speed is as expected (maybe 1 second per iteration).

I tested qsort with the cmp function, in one case, including a divide-by-zero and in another, just comparing by less-than.

Time taken using less-than: 1 second per iteration, 3x as many test cases as vanilla Klee. Time taken including divide-by-zero: ~20 minutes in total, 5x as many test cases as vanilla Klee.

I wonder if this is related to #46. If the special QEMU code is invoked, maybe the number of paths explodes; however, reviewing the code for helper_divl_EAX, I just don't see the path explosion there.

moralismercatus commented 9 years ago

I surrounded calls to helper_divl_EAX in the bitcode with helper functions to time how long it took to execute.

Result? 8 invocations of helper_divl_EAX taking anywhere from 0.0 to 0.04 seconds. This does not account for the ~45 seconds it took to complete the SE.

Additional info: Without call to helper_divl_EAX: KLEE: done: total instructions = 7881 KLEE: done: completed paths = 1 KLEE: done: generated tests = 1

With call to helper_divl_EAX: KLEE: done: total instructions = 18565 KLEE: done: completed paths = 11 KLEE: done: generated tests = 9

Not very useful, as the result provided by helper_divl_EAX is likely used, and absent it, the path exploration is terminated.

My next step is to print the time it takes to execute each and every function in the bitcode.

moralismercatus commented 9 years ago

Another anecdote is that vanilla Klee outputs: KLEE: done: total instructions = 908 KLEE: done: completed paths = 11 KLEE: done: generated tests = 7

real 0m4.470s user 0m4.232s sys 0m0.175s

Compared with CRETE: KLEE: done: total instructions = 18565 KLEE: done: completed paths = 11 KLEE: done: generated tests = 9

time approximately: 45.00s

Signficant number of instructions increase with CRETE (~20x). This alone may explain a 10x increase execution time. I suppose where this is magnified is that every generated test requires SE again taking another 45 seconds * 35 iterations.

This is a major problem. In summary, vanilla Klee completed coverage in ~4.5 seconds, and CRETE did the same in ~20 minutes.

Note that for both vanilla Klee and CRETE, there is a marked speed increase when a divide-by-zero is not present (or maybe divide operation in general). For example, vanilla Klee: With divide-by-zero: KLEE: ERROR: /home/intel/test/klee/sign/qsort.c:53: divide by zero KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 908 KLEE: done: completed paths = 11 KLEE: done: generated tests = 7

real 0m4.453s user 0m4.233s sys 0m0.218s

Without divide-by-zero: KLEE: done: total instructions = 863 KLEE: done: completed paths = 6 KLEE: done: generated tests = 6

real 0m0.072s user 0m0.024s sys 0m0.030s

Meaning that execution isn't necessarily inherently slow, but when a division comes into play, it's a major problem.

moralismercatus commented 9 years ago

More info:

Using unsigned char instead of unsigned int, CRETE was able to complete all 18 generated tests in 1:50 minutes. This is about a 10x decrease in time over unsigned int.

Vanilla Klee maintained the same execution time: 4.5 seconds. 7 tests.

The time increase from 1 to 4 bytes should be insignificant, as the number of paths relies on the size of the buffer and the pivot (in qsort) - not the size of the data. Because more permutations can take place shouldn't increase the number of paths traversed.

I'm converging on an overall understanding (though not for the increase between division present and not present, and byte size increase for qsort), and some solutions. Two aspects need to be addressed:

Reduce the number of test cases generated
Reduce the time each test case takes to symbolically execute.

moralismercatus commented 9 years ago

This issue can be put to rest. To answer the main question: when a floating point exception is raised, a core dump occurs, which takes some time.

To the other question, I have a much better understanding of the workings and reasons for inefficencies. We have already resolved some of the inefficiencies via various methods discussed elsewhere.

moralismercatus / crete

Investigate lag when divide-by-zero occurs #44