Performance improvement for StopGap

Now we improve the performance. Now Fluidanimate runs around 2X slower by employing the following mechanism. (1) Instead of using hash table to save corresponding recording data structure(struct eventlist) for each synchronization variable, pthread_mutex_t or pthread_barrier_t, we are creating new pthread_mutex_t and corresponding eventlist in a new allocated array, like this: { pthread_mutex_t realMutex; struct eventlist eventlist; } Then we save this new datastructure on the first word of original mutex. So there is no need to lookup hash table to get corresponding event list at all. However, we still maintain a hash table for all these event list. But we only check this hash table in the end of an epoch so that we can preset the iteration point for different synchronization variable.

(2) To further improve the performance, we do not malloc a "struct syncEvent" dynamically. We actually uses a pre-allocated pool of "struct syncEvent" initially. Then every time, we only need to increment a count to get new syncEvent. Corresponding code can be see in synceventpool.h.

In the future, we should consider to use this mechanism for system recording also.

plasma-umass / DoubleTake

Performance improvement for StopGap #3