tsantalis / RefactoringMiner

MIT License
358 stars 137 forks source link

Does not show all deleted files on a PR #715

Closed koppor closed 5 months ago

koppor commented 5 months ago

I wanted to work on https://github.com/JabRef/jabref/pull/8934 and to see if there is some easy fix for a bug introduced there. The GitHub diff (https://github.com/JabRef/jabref/pull/8934/files) doesn't help much.

However, RefactoringMiner does not show all files deleted. VM.java is shown deleted on GitHub, but not shown by RefactoringMiner at all (or do I overlook something)?

image

tsantalis commented 5 months ago

@koppor

There is an inner class in VM.java that has been extracted in its own class BstEntry.java The tool matched these two files. You can see the diff by clicking on the BstEntry.java diff.

To me is seems that many classes were extracted from VM.java to their own files, and our tool detects this information, but in the diff we just show one pair of files.

image

@pouryafard75 This seems to be related to issue https://github.com/pouryafard75/RM-ASTDiff/issues/99 We should find a better way to show when a single file is split to multiple files. Perhaps, include all pairs of diffs.

tsantalis commented 5 months ago

These are all the class refactorings reported by the tool

  1. Move Class org.jabref.logic.bst.VM.BstEntry moved to org.jabref.logic.bst.BstEntry
  2. Rename Class org.jabref.logic.bst.VMException renamed to org.jabref.logic.bst.BstVMException
  3. Move And Rename Class org.jabref.logic.bst.BibtexCaseChanger moved and renamed to org.jabref.logic.bst.util.BstCaseChanger
  4. Move And Rename Class org.jabref.logic.bst.BibtexWidth moved and renamed to org.jabref.logic.bst.util.BstWidthCalculator
  5. Move And Rename Class org.jabref.logic.bst.BibtexCaseChangersTest moved and renamed to org.jabref.logic.bst.util.BstCaseChangersTest
  6. Move And Rename Class org.jabref.logic.bst.BibtexNameFormatterTest moved and renamed to org.jabref.logic.bst.util.BstNameFormatterTest
  7. Move And Rename Class org.jabref.logic.bst.BibtexPurifyTest moved and renamed to org.jabref.logic.bst.util.BstPurifierTest
  8. Move And Rename Class org.jabref.logic.bst.TextPrefixFunctionTest moved and renamed to org.jabref.logic.bst.util.BstTextPrefixerTest
  9. Move And Rename Class org.jabref.logic.bst.BibtexWidthTest moved and renamed to org.jabref.logic.bst.util.BstWidthCalculatorTest
  10. Move And Rename Class org.jabref.logic.bst.BibtexCaseChanger.FORMAT_MODE moved and renamed to org.jabref.logic.bst.util.BstCaseChanger.FormatMode
  11. Move And Rename Class org.jabref.logic.bst.BibtexNameFormatter moved and renamed to org.jabref.logic.bst.util.BstNameFormatter
  12. Move And Rename Class org.jabref.logic.bst.BibtexPurify moved and renamed to org.jabref.logic.bst.util.BstPurifier
  13. Move And Rename Class org.jabref.logic.bst.BibtexTextPrefix moved and renamed to org.jabref.logic.bst.util.BstTextPrefixer

From this I can conclude that only the inner class BstEntry was extracted to its own file. And the rest of the code in VM class is deleted.

tsantalis commented 5 months ago

@koppor

With the last UI updates, it is more clear now where is VM.java

Screenshot from 2024-04-26 05-36-11

tsantalis commented 5 months ago

@koppor Based on the references inside other files BstPreviewLayout.java, it seems VM.java is actually now BstVM.java I will check why this rename is not reported by the tool.

tsantalis commented 5 months ago

@koppor Based on my analysis, VM.java is renamed to BstVM.java Some fields and two methods from VM.java are extracted to new class BstFuntions.java Inner class BstEntry in VM.java is extracted in its own file BstEntry.java

tsantalis commented 5 months ago

@koppor After the fix the tool finds the following related to VM.java

Extract Class org.jabref.logic.bst.BstFunctions from class org.jabref.logic.bst.VM Extract Class org.jabref.logic.bst.BstVMContext from class org.jabref.logic.bst.VM Move Class org.jabref.logic.bst.VM.BstEntry moved to org.jabref.logic.bst.BstEntry Rename Class org.jabref.logic.bst.VM renamed to org.jabref.logic.bst.BstVM

tsantalis commented 5 months ago

@koppor @calixtus

After the last update, there are 3 additional diffs you can inspect. The old ones: VM.javaBstVM.java TestVM.javaBstFunctionsTest.java

The new ones: VM.javaBstFunctions.java VM.javaBstEntry.java TestVM.javaBstVMTest.java

Clearly, VM.java has been split to 2 files (BstVM.java, BstFunctions.java) + extracted inner class BstEntry into its own file BstEntry.java. Similarly, TestVM.java has been split into two files (BstVMTest.java, BstFunctionsTest.java).

calixtus commented 5 months ago

Interesting. Yes, basically I splitted some of the classes, but... I also migrated from ANTLR3 to ANTLR4, which was in some parts a complete rewrite of the code.

tsantalis commented 4 months ago

@koppor @calixtus I noticed that in your tests, you replaced string literal concatenations with text blocks (Java 15 feature). So, I improved the tool to support better the matching of statements for this kind of migration.

Some of the interesting findings: testNumNames() and testNumNames2() from TestVM.java have been merged to a single test testNumNames() in BstFunctionsTest.java

Also many of the tests have been renamed, which makes it more challenging for our tool. Overall, this was a challenging PR from the diff point of view.

I think the improvements we made can help to understand better the changes. We also fixed a bug related to how moved attributes are visualized in the diff.

I just pushed a new Docker image, if you want to check the improvements.

koppor commented 4 months ago

Thank you for the continuous improvement here. We are trying to use RM now to check whether the ANTLR4 port of that PR can be fixed or if we need to switch back to ANTLR3 to keep our BibTeX BSTVM working.

koppor commented 4 months ago

I think, I have another issue where I don't know what to do. I thought that non-highlighted text could be found at the right side. However, I cannot find the text.

grafik

Maybe because the text is found at another file? Maybe, I could jump to that file using a double click?

tsantalis commented 4 months ago

@koppor @pouryafard75

Currently, we don't highlight inline comments, because they are not part of the AST.

But this comment is deleted. I am curious if you know where the lines L123-L622 in VM.java are currently located? Did the tool miss some kind of refactoring performed in the PR?

tsantalis commented 4 months ago

@koppor I did some further inspection, and the lines L123-L622 in VM.java have been extracted into separate methods in BstFunctions.java

The comments have become the javadoc of these methods.

Our tool can actually detect this kind of refactoring (Extract and Move Method), but it needs to find a call to the extracted method.

Where is the following method called in the PR? https://github.com/JabRef/jabref/pull/8934/files#diff-308b85aa1e1a5b691ca0ecf4ab3e79b957505b1e65f8cdbd63393b33e3302bd2R307-R331

tsantalis commented 4 months ago

@koppor The calls are done in method getBuiltInFuctions() of BstFunctions.java https://github.com/JabRef/jabref/pull/8934/files#diff-308b85aa1e1a5b691ca0ecf4ab3e79b957505b1e65f8cdbd63393b33e3302bd2R59-R101

and getBuiltInFuctions() is called in BstVM.java https://github.com/JabRef/jabref/pull/8934/files#diff-b43c59bea73ecbe7ede1b18eb5d999fd2a1f1eb93f4b87f07e9e461840843c06R80

This is a kind of indirect Extract and Move Method, which our tool does not currently support. I can try to figure out a way to support it. It is an interesting refactoring.

koppor commented 4 months ago

The solution was to fix two issues:

  1. instead of resolveIdentifier calling execute was called.
    public Integer visitBstFunction(BstParser.BstFunctionContext ctx) {
        String name = ctx.getChild(0).getText();
-        if (bstVMContext.functions().containsKey(name)) {
-            bstVMContext.functions().get(name).execute(this, ctx, selectedBstEntry);
-        } else {
-            visit(ctx.getChild(0));
-        }
-
+        LOGGER.trace("Resolving name {} at visitBstFunction", name);
+        resolveIdentifier(name, ctx);

https://github.com/JabRef/jabref/pull/11304/files#diff-b1a17d91238078657578bb1c265904eb00486eec25ed888a79bf42460ece897e

  1. something was wrong with substring$ implementation

I "wild guess" fixed it at.

-        if (startI < 0) {
-            startI += s.length() + 1;
-            startI = Math.max(1, (startI + 1) - lenI);
+        if (start < 0) {
+            int endOneBased = string.length() + start + 1;
+            start = Math.max(1, endOneBased - length + 1);
+            length = endOneBased - start + 1;
+        }

https://github.com/JabRef/jabref/pull/11304/files#diff-308b85aa1e1a5b691ca0ecf4ab3e79b957505b1e65f8cdbd63393b33e3302bd2

Full PR: https://github.com/JabRef/jabref/pull/11304


Spontaneous thought. If RefactoringMinor would filter out moves and semantically equal rewrites, I could investigate the real new code and think. Maybe, I would have found the issue at BstVMVisitor#visitBstFunction faster.

For a support of substring$, I have no clue. The function was taken 1:1 to the new branch with the instanceof pattern matching as only change. And I doubt that this caused troubles.

koppor commented 4 months ago

Currently, we don't highlight inline comments, because they are not part of the AST.

Oh, OK. I am somehow used to the OpenRewrite AST, which has explicit offers for comments (org.openrewrite.java.tree.J#getComments) (J.java)

But this comment is deleted.

It appears slightly modified (last line got an HTML tag) at org.jabref.logic.bst.BstFunctions.BstCallTypeFunction.

tsantalis commented 4 months ago

@koppor @calixtus @pouryafard75

We have done a huge progress on improving the diff quality for this PR. Our tool now can report code moves from deleted files to added files.

All following classes had one of their methods moved to BstFunctions.java ChangeCaseFunction.javaBstFunctions.java FormatNameFunction.javaBstFunctions.java PurifyFunction.javaBstFunctions.java TextPrefixFunction.javaBstFunctions.java WidthFunction.javaBstFunctions.java

We also use a different icon and arrow to differentiate between a file being moved/renamed itself, and code being moved between two files, as shown in the screenshot below.

Screenshot from 2024-05-29 20-54-33