Learning to rank relevant files for bug reports using domain knowledge

Categorization bug report

Author list for the paper Xin Ye : xy348709@ohio.edu Razvan Bunescu: bunescu@ohio.edu Chang Liu:liuc@ohio.edu

General overview of the data

This dataset contains bug reports, commit history, and API descriptions of six open source Java projects including Eclipse Platform UI, SWT, JDT, AspectJ, Birt, and Tomcat. This dataset was used to evaluate a learning to rank approach that recommends relevant files for bug reports.

Dataset structure File list:

AspectJ.[xls|xml] – The bug reports and commit history of AspectJ.
Birt.[xls|xml] – The bug reports and commit history of Birt.
Eclipse_Platform_UI.[xls|xml] – The bug reports and commit history of Eclipse Platform UI.
JDT.[xls|xml] – The bug reports and commit history of JDT.
SWT.[xls|xml] – The bug reports and commit history of SWT.
Tomcat.[xls|xml] – The bug reports and commit history of Tomcat.

XLS/XML Headings:

bug_id -- refers to the bug report id.
summary -- refers to the bug report summary.
description -- refers to the bug report description.
report_time -- refers to the bug report report time.
report_timestamp -- refers to the bug report report timestamp.
status -- refers to the status of the bug report.
commit -- refers to the SHA-1 hash id for the commit that fixed the bug report.
commit_timestamp -- refers to the commit timestamp.
files -- contains the full path of every Java file that was fixed in this commit. * result -- contains the position of every positive instance in our ranked list result.

Link to the paper associated with the dataset

http://dl.acm.org/citation.cfm?id=2635868.2635874&coll=DL&dl=GUIDE&CFID=464701544&CFTOKEN=55037057

Paper abstract When a new bug report is received, developers usually need to reproduce the bug and perform code reviews to find the cause, a process that can be tedious and time consuming. A tool for ranking all the source files of a project with respect to how likely they are to contain the cause of the bug would enable developers to narrow down their search and poten- tially could lead to a substantial increase in productivity. This paper introduces an adaptive ranking approach that leverages domain knowledge through functional decomposi- tions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history. Given a bug report, the ranking score of each source file is computed as a weighted combi- nation of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. We evaluated our system on six large scale open source Java projects, using the before-fix version of the project for every bug report. The experimental results show that the newly introduced learning-to-rank approach significantly outper- forms two recent state-of-the-art methods in recommending relevant files for bug reports. In particular, our method makes correct recommendations within the top 10 ranked source files for over 70% of the bug reports in the Eclipse Platform and Tomcat projects.

BibTeX reference for the paper

@inproceedings{Ye:2014, author = {Ye, Xin and Bunescu, Razvan and Liu, Chang}, title = {Learning to Rank Relevant Files for Bug Reports Using Domain Knowledge}, booktitle = {Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering}, series = {FSE 2014}, year = {2014}, location = {Hong Kong, China}, pages = {689--699}, numpages = {11}, }

Link to the datasets http://figshare.com/articles/The_dataset_of_six_open_source_Java_projects/951967

Is this dataset part of a larger series or collection? no

opensciences / opensciences.github.io

Learning to rank relevant files for bug reports using domain knowledge #200