tsantalis / RefactoringMiner

MIT License
370 stars 138 forks source link

Classes before refactoring with the name of the class after the refactoring #90

Closed mauricioaniche closed 4 years ago

mauricioaniche commented 4 years ago

Sometimes the name of the class (as well as the file name) before the refactoring is actually the name of the class after the refactoring. From what I could identify, this happens when a Move happens together with some other specific refactorings.

As an example, see this commit, class DataManagerCustomer.java. This class has been moved to a new package (to a .api package) and had also been through an extract interface.

When RMiner identifies the Move Package, the before and after are correct . However, when RMiner identifies the Extract Interface, it gives me, as classes before the refactoring, the ones with the .api package.

I thought this was a common behaviour: if a move happens, then all the other refactorings in the same file will have the new path as "before". But that doesn't seem to happen. I tried, Rename Variable and Extract Method together with Move Package, but in these cases, the class name and file points to the "old name and file" (which is the behaviour I was expecting!).

Is it possible for the before refactoring method to always return to the class name and file before the refactoring, i.e., how it was in the commit before? If not, is there a rule that I can try to follow in my implementation?

tsantalis commented 4 years ago

Hi @mauricioaniche Thanks for reporting all these issues. Here we are dealing with the problem of the same program element (e.g., class) being the subject of sequential refactorings in the same commit.

In this case, DataManagerCustomer participates in two refactorings:

Move Class org.apache.fineract.data.datamanager.DataManagerCustomer moved to org.apache.fineract.data.datamanager.api.DataManagerCustomer Extract Interface org.apache.fineract.data.datamanager.contracts.ManagerCustomer from class org.apache.fineract.data.datamanager.api.DataManagerCustomer

The truth of the matter is that we don't know what is the actual order the developer applied these refactorings. The tool internally detects the refactoring types in a particular order (the logic behind the order is explained in the paper) For this particular case, Move Class is detected before Extract Interface. As a result, the DataManagerCustomer has a new qualified name (or ID if you like) after being moved, which affects the way the Extract Interface refactoring is reported.

We actually use this information to construct sequences of refactorings within a commit with the goal of tracking the sequences of refactorings on the same program element across multiple commits. The ID of the program element changes as refactorings are applied on it, but this ID is important to construct the sequences of refactorings.

Maybe we should add the code for detecting sequences in RefactoringMiner to help with the problem you are facing. I will discuss this with my student to see if the developed algorithm is in a mature state.

mauricioaniche commented 4 years ago

Thanks for the very complete answer! I definitely imagined you had a reason for it!

For me, I could do a workaround by tracking all the possible refactorings that lead to a possible change in the id (using your nomenclature here), so that I could point to the file name and class name in the previous commit. It seems to work (or, at least, my tests are passing).

Not sure if it makes sense for you to implement a function in your side that gives the "old names" back. Maybe I'm the only one who's trying to automatically capture the class before the refactoring... Let's see if someone else asks for this feature :)