lfd / PaStA

The Patch Stack Analysis
GNU General Public License v2.0
33 stars 20 forks source link

Add support for differential analyses #77

Open rsarky opened 4 years ago

rsarky commented 4 years ago

There is a scope for optimising the analysis process when it comes to differential analyses, ie. we already have some existing analyses results in PaStA and some new patches arrive for analyses. What PaStA currently does is it assigns each of these new patches to a single element cluster, and then it runs the complete analyses again. This results in a lot of redundant comparisons. Example:

Consider the following existing state clusters of PaStA. I have indexed each cluster for illustration purposes:

1. 1 2 3
2. 4 5 7
3. 6 8

PaStA performed around 8*8 comparisons (ignoring other thresholds that PaStA has for now). For further comparisons PaStA will use the representative of each cluster, let's take the first element of each cluster above to be it's representative.ie repr( 1 2 3) = 1.

Now consider that patches 9 and 10 arrive. They will be assigned to their own single element clusters, ie:

1. 1 2 3
2. 4 5 6
3. 6 8
4. 9
5. 10

In the current situation PaStA performs 5x5 comparisons (compare representative of each cluster against the other). But we can reduce this by only comparing representatives of existing clusters with newly arriving patches as the other comparisons have already been done in the previous step. ie we reduce the comparisons to 3x2. Additionally we will also need to compare the new patches against each other a further 2x2 comparisons. Combined a total of 5x2 comparisons which is still much less than the naive way.

This can be written in a crude mathematical way as follows:

evaluation result = new_patches X existing_patches + new_patches X new_patches [note that existing_patches X existing_patches has already been done and it's result exists in the patch groups file] Thus, evaluation_result = new_patches X (existing_patches + new_patches)

Things to consider