mgijsberti / synoptic

Automatically exported from code.google.com/p/synoptic
0 stars 0 forks source link

Show mined invariants' support information #369

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The existing mining algorithms output and use invariants that are satisfied by 
all traces in the input log. However, it is often helpful to know "almost 
invariants", which are invariants that are falsified by a small subset of the 
input traces. As well, it is helpful to know how well supported a mined 
invariant is. For example, if a AFby b is mined from a trace, then it will be 
interpreted differently depending on (1) how many input traces/event types 
there are, (2) how many 'a' event instances there are, and other features of 
the input log.

The simplest two statistics that we can compute for the simple binary 
invariants we consider in synoptic are support count, and support percentage. 
Here is an example. Assume that you have a log with two traces: a,b,c,a and 
c,a,b,b,a,b

Now consider the invariant c AFby a. This invariant is true of both traces. The 
support count for this invariant is the number of times that it could have been 
falsified. That is, it is the number of "c" instances (since once we see a "c" 
it is not clear if we will see an "a" instance later, or not). The support 
count for this invariant is 2.

The support percentage for this invariant is the number of non-falsified 
instances divided by the support count. So, that's 2/2, or 100%.

Now, consider a different invariant -- a AFby b. This invariant is not true of 
the input log because the first traces does not satisfy it (final 'a' not 
followed by a 'b'). The support count for this invariant is 4, and the support 
percentage is 3/4 = 75%.

The goal of this task is two fold. First, add an option to synoptic that would 
output the mines invariants along with the support counts. Second, add an 
option that takes a support percentage threshold (between 0 and 100) and 
outputs all invariants above the specified threshold. Implement these two 
options to work only with the ChainWalkingTOInvMiner invariant miner. This 
miner maintains counts of all of the event instances and these statistics can 
be reused to compute support counts and percentages. 

Original issue reported on code.google.com by bestchai on 28 Mar 2014 at 1:10

GoogleCodeExporter commented 9 years ago
To expand on this further, for a AP b, the number of potential falsifications 
is the number of 'b' instances. While, for a NFby b, the number of 
falsifications is the number of 'a' instances.

Original comment by bestchai on 28 Mar 2014 at 7:15

GoogleCodeExporter commented 9 years ago
One more thing to add to this is a new option that thresholds the support 
count. This option filters all invariants that have support values (not support 
percentage values!) greater than a certain threshold. This option should only 
change code in AbstractMain and the related options processing code.

Original comment by bestchai on 31 Mar 2014 at 10:46

GoogleCodeExporter commented 9 years ago
Took a bit long to merge into mainline, but it's done! Merged into default with 
revision 8450defb17a5

Original comment by bestchai on 14 Jul 2014 at 6:23