Relate methods that changed between versions

khatchad commented 6 years ago

Once we have the methods that changes in the commits, we need to increment the DOI of those methods in the current version.
However, those methods may have been renamed via refactorings. As such, there may be no method to relate it to in the current project.
@khatchad mentioned that we can use the approach from a paper at ICSE'18

┆Issue is synchronized with this Asana task

khatchad commented 6 years ago

@saledouble Could you confirm that we are identifying methods that changed in the new version (i.e., revision B)?

khatchad commented 6 years ago

In other words, we're not looking at the method in revision B.

khatchad commented 6 years ago

For example: Revision A:

class C {
  void m() {
    int a = 5;
  }
}

Revision B:

class C {
  void n() {
    int a = 6;
  }
}

Please confirm that the changed method that is detected is C.n() and not C.m().

khatchad commented 6 years ago

What's interesting here is that you are analyzing the history but manipulating the current DOI. In other words, if you detect a method change in some version, that method may no longer be present.

khatchad commented 6 years ago

The DOI is only for elements in the current project.

khatchad commented 6 years ago

So, maybe we need actually two passes, one to construct a method rename refactoring graph, and then another pass to bump the DOI values.

For example, if you detect that method n() has changed and thus needs a DOI bump but n() no longer exists in the current workspace, we can lookup the current name of method n() using the graph constructed in the previous step.

In this graph, the nodes will be methods, i.e., every method in the history of the project, arcs with be rename refactorings. For example, if m() was renamed to n(), then there is an arc between the nodes representing m() and n().

After the first pass, we will have this (rename) graph. Then, before the second pass, we should construct a hash table that tells us the current name of every method.

In one pass, we will create (in O(n + a) time where n is the number of nodes and a the number of arcs) the table. Suppose m() winds up in the end as method z():

Method	Name in current project
`m()`	`z()`
`n()`	`z()`

Even though m() was changed to n() in the intermediate history, we see that it actually maps to z(), which is what it became finally.

yiming-tang-cs commented 6 years ago

Currently, the graph has already been built. Each vertex is composed of the method signature and the file path, which can uniquely locate the method. Each edge could represent method renaming or file renaming.

Here is an example output to evaluate 20 commits (for saving time):

v (m: createConvertToParallelStreamRefactoringProcessor(IJavaProject, IMethod[], Optional<IProgressMonitor>))-> v (m: createMigrateSkeletalImplementationToInterfaceRefactoringProcessor(IJavaProject, IMethod[], Optional<IProgressMonitor>))
v (m: Stream())-> v (m: toString())

khatchad commented 5 years ago

Hi @orenwf. This is the issue I discussed yesterday. There used to be a method called computeSimularity() that we had discussed needed to be changes to instead use existing approaches. But, I don't see that method anymore. @saledouble please help @orenwf locate the method that needs to be updated.

yiming-tang-cs commented 5 years ago

Currently, we only check body length of two methods. https://github.com/ponder-lab/Logging-Level-Evolution-Plugin/blob/dc48e0bc94b24b5a13c50fe944a5ce6c718788ce/edu.cuny.hunter.mylyngit.core/src/edu/cuny/hunter/mylyngit/core/analysis/GitHistoryAnalyzer.java#L725-L736

This is the caller. It always finds the most similar method from the candidate methods. For example, given a method in revision A, I get methods in revision B which have the same parameter numbers and types and then I find the most similar one from those candidate methods. https://github.com/ponder-lab/Logging-Level-Evolution-Plugin/blob/dc48e0bc94b24b5a13c50fe944a5ce6c718788ce/edu.cuny.hunter.mylyngit.core/src/edu/cuny/hunter/mylyngit/core/analysis/GitHistoryAnalyzer.java#L712

yiming-tang-cs commented 5 years ago

The rename graph: https://github.com/ponder-lab/Logging-Level-Evolution-Plugin/blob/dc48e0bc94b24b5a13c50fe944a5ce6c718788ce/edu.cuny.hunter.mylyngit.core/src/edu/cuny/hunter/mylyngit/core/analysis/GitHistoryAnalyzer.java#L80

The hash table: https://github.com/ponder-lab/Logging-Level-Evolution-Plugin/blob/dc48e0bc94b24b5a13c50fe944a5ce6c718788ce/edu.cuny.hunter.mylyngit.core/src/edu/cuny/hunter/mylyngit/core/analysis/GitHistoryAnalyzer.java#L145

yiming-tang-cs commented 5 years ago

computeSimilarity should be improved.
adding another checking for composite changes: renaming method and change parameters together.

khatchad commented 5 years ago

Moved this into the future milestone. The idea is to make some progress without this and add it at some point afterwards.

ponder-lab / Rejuvenate-Logging-Levels

Relate methods that changed between versions #98