spgroup / miningframework

framework for mining git projects
GNU General Public License v3.0
8 stars 18 forks source link

Mining Framework

Java CI

This is a framework for mining and analyzing git projects.

We focus on analyzing merge commits, although this could be easily changed to analyze any kind of commit.

We basically have variability points (hot spots) for

We also have a number of implementations for such variability points, so that one can reuse or adapt them as needed for instantiating the framework. The examples illustrated above correspond to some of the implementations we provide here.

Getting Started

Instantiating or extending the framework

You need to implement the following interfaces (see interfaces/) or choose their existing implementations (see services/):

They correspond to the four variability points described at the beginning of the page. The following Interfaces can have multiple implementations injected:

For those, the order which the they are injected will be followed by the framework, running the implementations in order

The framework uses Google Guice to implement dependency injection, and inject the interface implementations. So, to select the interface implementations you want to use in your desired instantiation of the framework, you also need to write a class such as StaticAnalysisConflictsDetectionModule in the injectors package, which acts as the dependency injector. This one, in particular, is used as a default injector if no other is specified when invoking the framework.

Running Mining Framework with Docker

If you have Docker available on your machine, you might find it easier to start playing with Mining Framework by using our pre-built Docker image.

The image is built upon Amazon Corretto with Java 8, and provides an already compiled distribution of Mining Framework. To start running it with Docker, simply run:

docker run -v $PWD/output:/usr/src/miningframework/output/ -v $PWD/projects.csv:/usr/src/miningframework/projects.csv --rm ghcr.io/spgroup/miningframework:master projects.csv 

Running a specific framework instantiation

You can run the framework by including the src directory in the classpath and executing src/main/app/Main.groovy. This project uses Gradle as its build system, so we will be using Gradle tasks to execute all framework's operations.

This can be done by configuring an IDE or executing the following command in a terminal:

[input] is the path to a CSV file containing the list of projects to be analyzed (like projects.csv), one project per line. The list can contain external projects to be downloaded by the framework (the path field should be an URL to a git project hosted in the cloud), or local projects (the path field should refer to a local directory).

[output] is the path to a directory that the framework should create containing the results (collected experimental data, statistics, etc.) of the mining process.

[options] a combination of our command line configuration options. It's useful to type --help in the [options] field to see the supported options and associated information.

The options are available to all variability points implementations, but some of the implementations might not make use of all options. So check the documentation of the variability points implementations you need to confirm that they really make use of the options of interest.

If you intend to use the framework multithreading option, be aware of the need to synchronize the access to output files or state manipulated by the implementations of the framework variability points.

For example, for running the study we use as an example to illustrate the variability points at the beginning of the page, we invoke the following command at the project top folder:

  • Linux/Mac: ./gradlew run --args="--access-key github-personal-access-token --threads 2 ./projects.csv SOOTAnalysisOutput"
  • Windows: .\gradlew run --args="--access-key github-personal-access-token --threads 2 ./projects.csv SOOTAnalysisOutput"

The CLI has the following help page:

usage: miningframework [options] [input] [output]
the Mining Framework take an input csv file and a name for the output dir
(default: output)
Options:
-a,--access-key <access key>               Specify the access key of the git account
for when the analysis needs user access to
GitHub
-e, --extension <extension>                Specify the file extension that should be 
used in the analysis (e.g. .rb, .ts, .java,
.cpp. Default: .java)
-h,--help                                  Show help for executing commands
-i,--injector <class>                      Specify the class of the dependency
injector (Must provide full name, default
injectors.StaticAnalysisConflictsDetection
Module)
-k,--keep-projects                         Specify that cloned projects must be kept
after the analysis (those are kept in
clonedRepositories/ )
-l,--language-separators <'separators'>    Specify the language separators that should
be used in the analysis. Required for (and 
only considered when) running studies with 
the CSDiff tool. Default: '{ } ( ) ; ,'
-log,--log-level <log level                Specify the minimum log level: (OFF, FATAL,
ERROR, WARN, INFO, DEBUG, TRACE, ALL).
Default: "INFO"
-p,--push <link>                           Specify a git repository to upload the
output in the end of the analysis (format
https://github.com/<owner>/<name>
-s,--since <date>                          Use commits more recent than a specific
date (format DD/MM/YYY)
-t,--threads <threads>                     Number of cores used in analysis (default:
1)
-u,--until <date>                          Use commits older than a specific
date(format DD/MM/YYYY)

Testing

One can run the framework tests by running the check task:

./gradlew check

Building

.\gradlew build -x test