bugprediction compile data from a SCM server (e.g. github), static code analysis, and optional manual import (e.g. issues, releases). It can provide an offline HTML report about the assessment of the risk of delivering the next release. It can be usefull for datascientists who wants to benchmark bug prediction machine learning models.
The tool will connect to a SCM repository and fecth releases, code history, and issues. For each release the tool will:
The tool needs to target a repository (e.g. GitHub, GitLab) with releases and issues. If you use another tool, you'd need to import releases and issues into the database.
You need to create and data/.env
file (by copying the .env-example
) and to fill at least these variables (see the documentation of populate command for) :
OTTM_SCM_PATH
: Path to git executable, leave "git" if it's into system env. pathOTTM_SOURCE_PROJECT
: Name of the project (e.g. dbeaver)OTTM_SOURCE_REPO
: Repositiory name (e.g. dbeaver/dbeaver)OTTM_CURRENT_BRANCH
: The branch containing the next release (e.g. devel)OTTM_SOURCE_REPO_URL
: # The full path to repo (e.g. https://github.com/dbeaver/dbeaver)OTTM_SOURCE_BUGS
: Source where we get issues (e.g. git)OTTM_SOURCE_REPO_SCM
: Either "github" or "gitlab", other SCM are not yet supportedOTTM_SCM_BASE_URL
: SMC base URL - leave empty for public repoOTTM_SCM_TOKEN
: Token to access github or gitlabOTTM_TARGET_DATABASE
: The default value will generate a SQLite database into the current folderOTTM_ISSUE_TAGS
: On bug reporting tools, you can filter issues by tags. You can specify multiples tags, comma separated.OTTM_JIRA_BASE_URL
: The full path to jira project (e.g. https://jira.atlassian.com)OTTM_JIRA_PROJECT
: Jira project identifierOTTM_JIRA_EMAIL
: Jira user email address. To access Jira API, you need to provide your access tokend AND your email adressOTTM_JIRA_TOKEN
: Token to access jiraOTTM_JIRA_ISSUE_TYPE
: When Jira is used as the bug reporting tool, you can filter issues by their issue type. You can specify several filters, comma separeted. Usually, bugs are repported on "Bug" issue type.
The first step (it might take a while) is to populate the database with versions, issues and commits. The repository will be cloned into a temporary folder and we will checkout all versions in order to generate code metrics. You can run this command in many times as it will only amend the database with latest changes.
python main.py populate
The tool is shipped with two simple bug prediction models. You need to train each model before you can use it:
python main.py train --model-name bugvelocity
And then use it to predict the number of bugs into the comming release (based on the metrics extracted from OTTM_CURRENT_BRANCH
):
$ python main.py predict --model-name bugvelocity
Predicted value : 31
You can generate an offline HTML report:
$ python main.py report
One of the features of the report is to assess the risk of releasing the next version of your project:
See the list of commands for other options.
The tool currently doesn't support repositories with multiple releases in parallel (i.e. a latest version maintained in parallel of a LTS version). You have to import the branch of versions that you want to examine.
Linking issues and commits to a version is a tedious task. At this stage, the tool roughly estimate that issues and commits are linked to a version if the objects were created between the start and and dates of the version.
The tool is released under a MIT licence. Contributors are welcomed in many areas (imporve the tool, add your own model, add a more clever Git tree exploration algo, etc.).