tsantalis / RefactoringMiner

MIT License
345 stars 132 forks source link

Creation of a new feature to allow collect the refactorings implemented at the time of the merge #753

Open aoliveira100 opened 1 week ago

aoliveira100 commented 1 week ago

Currently, when I pass a merge commit to get the refactorings, the Refactoring Miner returns duplicate refactorings, which were implemented in the branch commits, and not those effectively implemented during the merge operation.

I propose the creation of a new feature to allow the collection of refactorings implemented at the time of the merge. Although it is uncommon to implement refactorings at this time, we know that this is possible.

It could be an API detectAtMergeCommit(mergeCommit) that collects refactorings exclusively implemented in the merge. (PS: I'm not referring to the accumulated refactorings of the branches in this merge, but the specific refactorings implemented only in the merge).

In my research work, I have discarded the merge commits from my samples precisely because the Refactoring Miner returns duplicate refactorings in this case. I have considered this limitation a threat to external validity. I have justified that the number of situations where the DevOps/developers implement refactorings at the exact moment of executing the merge is relatively low.

Thanks in advance.

tsantalis commented 1 week ago

@aoliveira100 Thank your for your interest in our project and for your feature request.

It is not clear to me how we can get the specific refactorings implemented only in the merge RefactoringMiner can obtain refactorings given a pair of child and parent commits.

But I don't think any pair of merge commit and its parent commits gives what you are looking for.

Do you have any idea, how we could obtain those refactorings implemented only in the merge?

aoliveira100 commented 1 week ago

@tsantalis

Thank you for support

Consider the merge commit with sha1 573ba90. When we run the command git diff 573ba90 573ba90^1 573ba90^2 (where ^1 is parent 1 of the merge commit and ^2 is parent 2), we can identify through the ++/-- directives which lines of code were added (++) and removed (--) in the merge commit. Looking at the lines with ++ (added in the merge), some of these new implementations could be code refactorings. We know this must be an exception, but it can happen.

Example: ++ if ($field_name == $selected) { ++ ++ echo "field_type: {$field['type']}\n"; ++ echo "field_name: {$field_name}\n"; ++ ++ foreach ( $node->$field_name as $language => $value ) {

https://git-scm.com/docs/git-diff

tsantalis commented 1 week ago

@aoliveira100 Do you have a merge commit in Java to experiment with? Even if it an artificial one that you created, it is totally fine.

aoliveira100 commented 1 week ago

@tsantalis Of course. I will prepare it next week. I'm traveling for work today and returning July 1st. Thank you.