What To Do for the Main Mining Challenge

msrchallenge commented 14 years ago

I was thinking of following along an idea Chris Bird and Abram Hindle had on making the mining challenge about comparing projects.

I very much liked the idea. Maybe we should use Web browsers such as Mozilla and Chrome for that, but we also need something on the Java side, I was looking for two Java based browsers but found only Lobo (http://lobobrowser.org/) which sadly seems to be dead.

As an alternative we could take two Java Development Environments and a third system for which we can find one Java and one C/C++ implementation.

What do you guys think?

thanhnguyen commented 14 years ago

By comparing projects, we are trying to compare their bug/source repo? I am not sure if comparing code metrics is suitable for the challenge because it requires some work.

emadshihab commented 14 years ago

I really like the idea of doing multiple projects and comparing them. However, from what I have seen (and my experience with the challenge), very few people actually did this. I think the main issue was that we would provide the data for different projects in different formats; and no one really wants to write 2 tools to compare.

One thing I would like to see is that we provide the data in a standard format, such that people can literally change the input file name and use the exact same tool. I think this would encourage people to run their analysis on more than 1 project.

That said, this means we need to do more work when we process the data :)

schadr commented 14 years ago

@Thanh: well it is more work than simply creating metrics for one project but the "magic" happens when we give those metrics meaning and I think it becomes more interesting when comparing projects. And you are not limited to metrics you can also compare what the developers talk about, remember the idea is to have comparable projects for instance Firefox and Chrome, both browsers, one would suspect that they talk about similar stuff, but do they and how do they differ and why?

@emad: True, I was thinking of providing the bug data in something like a standard XML format that contains the current bug report as is and then attached a history of changes containing the old values together with who changed it when.

For the source code archive there are currently two ways that come to mind, we can port all source code archives to one repository, for instance make an SVN repro out of every repro we have, or we can look at Harald Gall's work. They are working on ontologies that describe database structures for mining purposes and bring it into a format that is compatible with their format.

ghost commented 14 years ago

@schadr: For sure we (Harald Gall's group) can help extracting the data about those projects history. One solution could be using our tool, Evolizer, (currently being revamped) to export and save all the version control historical data into a DB. The other solution involves using one of our new software evolution analysis web services, which extracts that very same data and structures it using a system independent version control ontology (using RDF).

schadr commented 14 years ago

@Giacomo: that would be great if we could use Evolizer or your webservice to extract the repositories. Say what kind of repositories does the tool/webservice support? Could you maybe give us a link for the RDF structure and the database layout so that we can have a look?

ghost commented 14 years ago

@schadr: Sorry for the late reply (i've been kinda busy) I'll give you some pointers about our rdf structure and evolizer by the end of this week.

schadr commented 14 years ago

thanks

msrchallenge / challenge2011

What To Do for the Main Mining Challenge #1