jodavimehran / code-tracker

GNU General Public License v3.0
18 stars 6 forks source link

Code Tracker

This project aims to introduce CodeTracker, a refactoring-aware tool that can generate the commit change history for method and variable declarations in a Java project with a very high accuracy.

Table of Contents

How to cite CodeTracker

If you are using CodeTracker in your research, please cite the following papers:

Mehran Jodavi and Nikolaos Tsantalis, "Accurate Method and Variable Tracking in Commit History," pp. 183-195, 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'2022), Singapore, Singapore, November 14–18, 2022.

@inproceedings{10.1145/3540250.3549079,
   author = {Jodavi, Mehran and Tsantalis, Nikolaos},
   title = {Accurate Method and Variable Tracking in Commit History},
   year = {2022},
   isbn = {9781450394130},
   publisher = {Association for Computing Machinery},
   address = {New York, NY, USA},
   url = {https://doi.org/10.1145/3540250.3549079},
   doi = {10.1145/3540250.3549079},
   booktitle = {Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
   pages = {183–195},
   numpages = {13},
   keywords = {commit change history, refactoring-aware source code tracking},
   location = {Singapore, Singapore},
   series = {ESEC/FSE 2022}
}

Mohammed Tayeeb Hasan, Nikolaos Tsantalis, and Pouria Alikhanifard, "Refactoring-aware Block Tracking in Commit History," IEEE Transactions on Software Engineering, 2024.

@article{Hasan:TSE:2024:CodeTracker2.0,
   author = {Hasan, Mohammed Tayeeb and Tsantalis, Nikolaos and Alikhanifard, Pouria},
   journal = {IEEE Transactions on Software Engineering},
   title = {Refactoring-aware Block Tracking in Commit History},
   year = {2024},
   pages = {1-20},
   doi = {10.1109/TSE.2024.3484586}
}

Requirements

Java 11.0.15 or newer

Apache Maven 3.6.3 or newer

How to Build and Run

Command line

  1. Clone repository

git clone https://github.com/jodavimehran/code-tracker.git

  1. Cd in the locally cloned repository folder

cd code-tracker

  1. Build code-tracker

mvn install

  1. Run the API usage examples shown in README

mvn compile exec:java -Dexec.mainClass="org.codetracker.Main"

Note: by default the repository https://github.com/checkstyle/checkstyle.git will be cloned in folder "code-tracker/tmp". If you want to change folder where the repository will be cloned, you have to edit the field FOLDER_TO_CLONE in class org.codetracker.Main and execute mvn install again

  1. Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)

mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.MethodExperimentStarter"

  1. Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)

mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.VariableExperimentStarter"

  1. Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)

mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.BlockExperimentStarter"

Note: by default the analyzed repositories will be cloned in folder "code-tracker/tmp". If you want to change folder where the repositories will be cloned, you have to edit the field FOLDER_TO_CLONE in class org.codetracker.experiment.AbstractExperimentStarter and execute mvn install again

Eclipse IDE

  1. Clone repository

git clone https://github.com/jodavimehran/code-tracker.git

  1. Import project

Go to File -> Import... -> Maven -> Existing Maven Projects

Browse to the root directory of project code-tracker

Click Finish

The project will be built automatically.

  1. Run the API usage examples shown in README

From the Package Explorer navigate to org.codetracker.Main

Right-click on the file and select Run as -> Java Application

  1. Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)

From the Package Explorer navigate to org.codetracker.experiment.MethodExperimentStarter

Right-click on the file and select Run as -> Java Application

  1. Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)

From the Package Explorer navigate to org.codetracker.experiment.VariableExperimentStarter

Right-click on the file and select Run as -> Java Application

  1. Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)

From the Package Explorer navigate to org.codetracker.experiment.BlockExperimentStarter

Right-click on the file and select Run as -> Java Application

IntelliJ IDEA

  1. Clone repository

git clone https://github.com/jodavimehran/code-tracker.git

  1. Import project

Go to File -> Open...

Browse to the root directory of project code-tracker

Click OK

The project will be built automatically.

  1. Run the API usage examples shown in README

From the Project tab navigate to org.codetracker.Main

Right-click on the file and select Run Main.main()

  1. Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)

From the Project tab navigate to org.codetracker.experiment.MethodExperimentStarter

Right-click on the file and select Run MethodExperimentStarter.main()

  1. Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)

From the Project tab navigate to org.codetracker.experiment.VariableExperimentStarter

Right-click on the file and select Run VariableExperimentStarter.main()

  1. Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)

From the Project tab navigate to org.codetracker.experiment.BlockExperimentStarter

Right-click on the file and select Run BlockExperimentStarter.main()

How to add as a Maven dependency

Maven Central

Since version 1.0, CodeTracker is available in the Maven Central Repository. In order to use CodeTracker as a maven dependency in your project, add the following snippet to your project's build configuration file:

pom.xml

<dependency>
  <groupId>io.github.jodavimehran</groupId>
  <artifactId>code-tracker</artifactId>
  <version>2.6</version>
</dependency>

build.gradle

implementation 'io.github.jodavimehran:code-tracker:2.6'

How to Track Blocks

CodeTracker can track the history of code blocks in git repositories.

In the code snippet below we demonstrate how to print all changes performed in the history of for (final AuditListener listener : listeners).

.codeElementType() can take the following values:

    GitService gitService = new GitServiceImpl();
    try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
            "https://github.com/checkstyle/checkstyle.git")){

        BlockTracker blockTracker = CodeTracker.blockTracker()
                .repository(repository)
                .filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
                .startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
                .methodName("fireErrors")
                .methodDeclarationLineNumber(384)
                .codeElementType(CodeElementType.ENHANCED_FOR_STATEMENT)
                .blockStartLineNumber(391)
                .blockEndLineNumber(393)
                .build();

        History<Block> blockHistory = blockTracker.track();

        for (History.HistoryInfo<Block> historyInfo : blockHistory.getHistoryInfoList()) {
            System.out.println("======================================================");
            System.out.println("Commit ID: " + historyInfo.getCommitId());
            System.out.println("Date: " +
                    LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
            System.out.println("Before: " + historyInfo.getElementBefore().getName());
            System.out.println("After: " + historyInfo.getElementAfter().getName());

            for (Change change : historyInfo.getChangeList()) {
                System.out.println(change.getType().getTitle() + ": " + change);
            }
        }
        System.out.println("======================================================");
    }

How to Track Methods

CodeTracker can track the history of methods in git repositories.

In the code snippet below we demonstrate how to print all changes performed in the history of public void fireErrors(String fileName, SortedSet<LocalizedMessage> errors).

    GitService gitService = new GitServiceImpl();
    try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
            "https://github.com/checkstyle/checkstyle.git")){

        MethodTracker methodTracker = CodeTracker.methodTracker()
            .repository(repository)
            .filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
            .startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
            .methodName("fireErrors")
            .methodDeclarationLineNumber(384)
            .build();

        History<Method> methodHistory = methodTracker.track();

        for (History.HistoryInfo<Method> historyInfo : methodHistory.getHistoryInfoList()) {
            System.out.println("======================================================");
            System.out.println("Commit ID: " + historyInfo.getCommitId());
            System.out.println("Date: " + 
                LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
            System.out.println("Before: " + historyInfo.getElementBefore().getName());
            System.out.println("After: " + historyInfo.getElementAfter().getName());

            for (Change change : historyInfo.getChangeList()) {
                System.out.println(change.getType().getTitle() + ": " + change);
            }
        }
        System.out.println("======================================================");
    }

How to Track Variables

CodeTracker can track the history of variables in git repositories.

In the code snippet below we demonstrate how to print all changes performed in the history of final String stripped.

    GitService gitService = new GitServiceImpl();
    try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
            "https://github.com/checkstyle/checkstyle.git")){

        VariableTracker variableTracker = CodeTracker.variableTracker()
            .repository(repository)
            .filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
            .startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
            .methodName("fireErrors")
            .methodDeclarationLineNumber(384)
            .variableName("stripped")
            .variableDeclarationLineNumber(385)
            .build();

        History<Variable> variableHistory = variableTracker.track();

        for (History.HistoryInfo<Variable> historyInfo : variableHistory.getHistoryInfoList()) {
            System.out.println("======================================================");
            System.out.println("Commit ID: " + historyInfo.getCommitId());
            System.out.println("Date: " + 
                LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
            System.out.println("Before: " + historyInfo.getElementBefore().getName());
            System.out.println("After: " + historyInfo.getElementAfter().getName());

            for (Change change : historyInfo.getChangeList()) {
                System.out.println(change.getType().getTitle() + ": " + change);
            }
        }
        System.out.println("======================================================");
    }

How to Track Attributes

CodeTracker can track the history of attributes in git repositories.

In the code snippet below we demonstrate how to print all changes performed in the history of private PropertyCacheFile cacheFile.

    GitService gitService = new GitServiceImpl();
    try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
            "https://github.com/checkstyle/checkstyle.git")) {

        AttributeTracker attributeTracker = CodeTracker.attributeTracker()
                .repository(repository)
                .filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
                .startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
                .attributeName("cacheFile")
                .attributeDeclarationLineNumber(132)
                .build();

        History<Attribute> attributeHistory = attributeTracker.track();

        for (History.HistoryInfo<Attribute> historyInfo : attributeHistory.getHistoryInfoList()) {
            System.out.println("======================================================");
            System.out.println("Commit ID: " + historyInfo.getCommitId());
            System.out.println("Date: " + 
                LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
            System.out.println("Before: " + historyInfo.getElementBefore().getName());
            System.out.println("After: " + historyInfo.getElementAfter().getName());

            for (Change change : historyInfo.getChangeList()) {
                System.out.println(change.getType().getTitle() + ": " + change);
            }
        }
        System.out.println("======================================================");
    }

How to Run the REST API

You can serve CodeTracker as a REST API.

In the command line, run

mvn compile exec:java -Dexec.mainClass="org.codetracker.rest.REST"

To provide GitHub credentials for tracking private repositories, set environment variables GITHUB_USERNAME and GITHUB_KEY before running the API.

Rest API Endpoints

Endpoint

HTTP Method: GET

Endpoint URL: /api/track

Endpoint Description

Initiate one of the four supported Trackers on a given code element. Returns the change history of the selected element in the form of a JSON array. Works for all types of supported code elements (methods, attributes, variables, blocks).

Parameters

Request Parameters (query params)

Parameter Type Description
owner String The owner of the repository.
repoName String The name of the repository.
commitId String The commit Id to start tracking from.
filePath String The path of the file the code element is defined in.
selection String The code element to be tracked.
lineNumber String The line the code element is defined on
gitHubToken String [Optional] The GitHub access token for private repositories.

Request Example

{
    "owner": "checkstyle",
    "repoName": "checkstyle",
    "filePath": "src/main/java/com/puppycrawl/tools/checkstyle/JavadocDetailNodeParser.java",
    "commitId": "119fd4fb33bef9f5c66fc950396669af842c21a3",
    "selection": "stack",
    "lineNumber": "486"
}

Endpoint

HTTP Method: GET

Endpoint URL: /api/codeElementType

Endpoint Description

Detect the type of code element selected using the CodeElementLocator API. Returns the type of code element selected. Works for all types of supported code elements (methods, attributes, variables, blocks).

Parameters

Request Parameters (query params)

Parameter Type Description
owner String The owner of the repository.
repoName String The name of the repository.
commitId String The commit Id to start tracking from.
filePath String The path of the file the code element is defined in.
selection String The code element to be tracked.
lineNumber String The line the code element is defined on
gitHubToken String [Optional] The GitHub access token for private repositories.

Request Example

{
    "owner": "checkstyle",
    "repoName": "checkstyle",
    "filePath": "src/main/java/com/puppycrawl/tools/checkstyle/JavadocDetailNodeParser.java",
    "commitId": "119fd4fb33bef9f5c66fc950396669af842c21a3",
    "selection": "stack",
    "lineNumber": "486"
}

Oracle

The oracle we used to evaluate CodeTracker is an extension of CodeShovel oracle, including the evolution history of 200 methods and the evolution history of 1345 variables and 1280 blocks declared in these methods, is available in the following links:

JSON property descriptions

repositoryName: folder in which the repository is cloned
repositoryWebURL: Git repository URL
filePath: file path in the start commit
functionName: method declaration name in the start commit
functionKey: unique string key of the method declaration in the start commit
functionStartLine: method declaration start line in the start commit
variableName: variable declaration name in the start commit
variableKey: unique string key of the variable declaration in the start commit
variableStartLine: variable declaration start line in the start commit
startCommitId: start commit SHA-1
expectedChanges: list of changes on the tracked program element in the commit history of the project
parentCommitId: parent commit SHA-1
commitId: child commit SHA-1
commitTime: commit time in Unix epoch (or Unix time or POSIX time or Unix timestamp) format
changeType: type change
elementFileBefore: file path in the parent commit
elementNameBefore: unique string key of the program element in the parent commit
elementFileAfter: file path in the child commit
elementNameAfter: unique string key of the program element in the child commit
comment: Refactoring or change description

Some Samples of CodeShovel's false cases

In the extended oracle we fixed all inaccuracies that we found in the original oracle. For example, the following methods in the original oracle are erroneously matched with another method which is extracted from their body. In fact, these methods are introduced as a result of an Extract Method refactoring.

CodeTracker's misreporting samples

To avoid unnecessary processing and speed up the tracking process, CodeTracker excludes some files from the source code model. The excluding action may cause misreporting of change type in some special scenarios. Although CodeTracker supports three scenarios in which additional files need to be included in the source code model, it may misreport MoveMethod changes as FileMove because the child commit model did not include the origin file of the method. In the test oracle, there are three such cases: case 1, case 2 and case 3.

Experiments

Execution Time:

As part of our experiments, we measured the execution time of CodeTracker and CodeShovel to track each method's change history in the training and testing sets. All data we recorded for this experiment and the script for generating the execution time plots are available here.

Tracking Accuracy

All data we collect to compute the precision and recall of CodeTracker and CodeShovel at commit level and change level are available in the following links:

CSV column descriptions

detailed-tracker-training.csv detailed-tracker-test.csv
file_name: corresponding JSON file name in the oracle
repository: Git repository URL
element_key: unique string key of the program element in the start commit
parent_commit_id: parent commit SHA-1
commit_id: child commit SHA-1
commit_time: commit time in Unix epoch (or Unix time or POSIX time or Unix timestamp) format
change_type: type of change
element_file_before: file path in the parent commit
element_file_after: file path in the child commit
element_name_before: unique string key of the program element in the parent commit
element_name_after: unique string key of the program element in the child commit
result: True Positive (TP), False Positive (FP) or False Negative (FN)
comment: Refactoring or change description

summary-tracker-training.csv summary-tracker-test.csv
instance: unique string key of the program element in the start commit
processing_time: total execution time in milliseconds
analysed_commits: total number of processed commits
git_log_command_calls: number of times git log command was executed (step 1 of our approach)
step2: number of times step 2 of our approach was executed
step3: number of times step 3 of our approach was executed
step4: number of times step 4 of our approach was executed
step5: number of times step 5 of our approach was executed
tp_change_type: number of True Positives (TP) for this specific change_type
fp_change_type: number of False Positives (FP) for this specific change_type
fn_change_type: number of False Negatives (FN) for this specific change_type
tp_all: total number of True Positives (TP)
fp_all: total number of False Positives (FP)
fn_all: total number of False Negatives (FN)

final.csv
tool: tool name (tracker or shovel)
oracle: oracle name (training or test)
level: change report level (commit or change)
processing_time_avg: average processing time
processing_time_median: median processing time
tp: total number of True Positives (TP)
fp: total number of False Positives (FP)
fn: total number of False Negatives (FN)
precision: precision percentage
recall: recall percentage