opensciences / opensciences.github.io

Website for OpenScience -
http://openscience.us
MIT License
26 stars 18 forks source link

Learning to rank relevant files for bug reports using domain knowledge #200

Open CarterPape opened 9 years ago

CarterPape commented 9 years ago

Categorization bug report

Author list for the paper Xin Ye : xy348709@ohio.edu Razvan Bunescu: bunescu@ohio.edu Chang Liu:liuc@ohio.edu

General overview of the data

This dataset contains bug reports, commit history, and API descriptions of six open source Java projects including Eclipse Platform UI, SWT, JDT, AspectJ, Birt, and Tomcat. This dataset was used to evaluate a learning to rank approach that recommends relevant files for bug reports.

Dataset structure File list:

XLS/XML Headings:

Link to the paper associated with the dataset

http://dl.acm.org/citation.cfm?id=2635868.2635874&coll=DL&dl=GUIDE&CFID=464701544&CFTOKEN=55037057

Paper abstract When a new bug report is received, developers usually need to reproduce the bug and perform code reviews to find the cause, a process that can be tedious and time consuming. A tool for ranking all the source files of a project with respect to how likely they are to contain the cause of the bug would enable developers to narrow down their search and poten- tially could lead to a substantial increase in productivity. This paper introduces an adaptive ranking approach that leverages domain knowledge through functional decomposi- tions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history. Given a bug report, the ranking score of each source file is computed as a weighted combi- nation of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. We evaluated our system on six large scale open source Java projects, using the before-fix version of the project for every bug report. The experimental results show that the newly introduced learning-to-rank approach significantly outper- forms two recent state-of-the-art methods in recommending relevant files for bug reports. In particular, our method makes correct recommendations within the top 10 ranked source files for over 70% of the bug reports in the Eclipse Platform and Tomcat projects.

BibTeX reference for the paper

@inproceedings{Ye:2014, author = {Ye, Xin and Bunescu, Razvan and Liu, Chang}, title = {Learning to Rank Relevant Files for Bug Reports Using Domain Knowledge}, booktitle = {Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering}, series = {FSE 2014}, year = {2014}, location = {Hong Kong, China}, pages = {689--699}, numpages = {11}, }

Link to the datasets http://figshare.com/articles/The_dataset_of_six_open_source_Java_projects/951967

Is this dataset part of a larger series or collection? no

CarterPape commented 9 years ago

Duplicate of #123, but kept as example of stage Summarized