davidsjohnson commented 9 years ago

We'll use the second part of the lab for designing project 2. In this post provide the following:

Research problem in the form of a question
Datasets to use
A couple hypotheses based on some intuition from research or previous lessons learned.
- should be more specific assertions that will help you answer your research question
Initial thoughts on how you plan to attack the problem

eburdon commented 9 years ago

I'm looking to join a team for Project 2! Shoot me a message if you're looking for another person to help with your research problem.

chrisjcook commented 9 years ago

Research problem: What is the relationship between growth and coupling?
Datasets to use: Any/all Python Git repositories.
Hypotheses:
1. A metric of correlation between growth and coupling can be measured for any python project.
2. Increases in a project's coupling are caused by an increase in a project's rate of growth.
3. Decreasing a project's coupling causes a decrease in a project's rate of growth.
Plan of Attack:
1. Get our data output into the correct format (the way in which we eventually plot it)
2. Polish application by streamlining the flow (removing/automating manual steps)
3. Remove excess/overhead code in all areas of our application to improve performance.
4. Incorporate the matplotlib library in order to automatically plot our desired graphs.
5. Investigate the relationship between coupling and growth using both more granular and more mathematical/statistical methods.
6. Look into the relative difficulty of parallelizing our application using threading (and possibly distributed computing).
7. ???
8. Profit.

BrodyHolden commented 9 years ago

Research question: How does refactoring (i.e. code moves) change complexity?
Datasets to use: Github repositories.
Hypotheses: We hypothesize that complexity will decrease after refactoring.
Plan of attack:
1. Choose one or more repositories.
2. Identify major code moves using Transit.
3. Observe how the refactoring affected code complexity.

Team: (Andrew H.) @Hoverbear (Brody H.) @BrodyHolden (Fraser D.) @fraserd

If anyone else would like to join, let us know.

knowlesc commented 9 years ago

Colin Knowles (@knowlesc) and Ryan McDonald (@ryanmcdonald)

Question: Does the activity on Stack Overflow, or social media sites such as Twitter, drive the number of contributors/size of a project or do social media mentions drive the contributors/size?

Datasets: We're thinking of looking at Angular, Bootstrap, and Rails, but we might modify the codebases later on. We plan to use gitstats, and the Stack Exchange API (at least) to gather data on these repositories.

Hypotheses: We think that the number of contributors/project size should drive the mentions on social media.

Plan of Attack: 1) Figure out how to use the APIs of some social media sites 2) Search for mentions of certain projects by date 2) Run gitstats on the codebases 3) Compare project statistics with social media information

paulmoon commented 9 years ago

Jian Guan @guand Jonathan Lam @lamj1234 Paul Moon @paulmoon

Research problem in the form of a question

How are different types of GitHub activites correlated with user satisfaction?

Datasets

We'll analyze some of the most popular GItHub repositories: AngularJS, Bootstrap, Node.js, Django, and MongoDB. The most popular repos were chosen because they generate a lot of user feedback and discussion within the community, which we can analyze to measure user satisfaction.

Hypotheses

As user satisfaction increases, the number of issues will increase because high user satisfaction will promote discussion and recommendations, which will increase the number of users and therefore the number of bugs / issues.
The number of anti-regressive changes will be directly correlated with user satisfaction. The anti-regressive changes will refactor the software and fix bugs, which will increase user satisfaction.

Initial thoughts on how you plan to attack the problem

Scrape posts and comments from relevant forums, such as Reddit subreddits, Hacker News, Quora etc.
Run sentiment analysis on the posts and comments for overall sentiment (happy? sad?).
Use GitHub API to gather GitHub activites such as number of pull requests, number of issues, number of bug fixes etc.
Use D3.js to graph user satisfaction & various GitHub activities vs. time.

Brayden-Arthur commented 9 years ago

How does user feedback effect video game patches and updates

Data

We are using reddit comments and posts as a primary data source for user feedback, with community wikis as the source for patches and updates

Hypothesis

We hope to find that more successful games have a stronger relation between user feedback and patch note data. This would show that community feedback is vital to a games growth.

Initial thoughts

Expand upon current process to make it automated
Research other methods of obtaining patch data
Look for other possible projects to apply the tool that have better data sources
Try to integrate other tools to manipulate existing data

Jeremy Kroeker, Brayden Arthur

davidsjohnson commented 9 years ago

@BrodyHolden @Hoverbear and @fraserd Here's a study on refactoring you might find interesting. The paper touches on some metrics for complexity and coupling that may help. http://www.itworld.com/article/2891140/study-finds-that-refactoring-doesn-t-improve-code-quality.html

@Jsyro this may be of interest to your project group as well.

Hoverbear commented 9 years ago

@fortjohnson Thanks! However, 4,500 lines is a really small project!

mitchellri commented 9 years ago

Research problem in the form of a question

How do software development methods affect different properties of development?

Datasets to use

Old Feature development time Initial use for Issue tracking New Estimated Feature completion time

A couple hypotheses based on some intuition from research or previous lessons learned. Should be more specific assertions that will help you answer your research question

We have found no relation between the time it took to develop these features between the selected development methods. However, you would think that these different methods would have different estimated rate of feature completed times. It is also assumed, that since the previously selected methods were so similar, that we will see differences for other, more varied methods.

Initial thoughts on how you plan to attack the problem

Taken from our repositories readme: "Take the average feature completion time for the whole project (average of [end day-start day] for each feature) and divide the average estimated completion time for the whole project (average of [planned end day-start day] for each feature) to get the feature completion time ratios for each project. This would be visualized using a 'completion time ratio vs project' bar graph. An additional graph could be provided to show the average feature completion time ratios of all the projects in the same software development methods, plotted with other development methods. This graph would be a 'completion time ratio vs software development method' bar graph."

Mitchell Rivett Tyler Potter

Bleech94 commented 9 years ago

Problem

Can we create a tree of communication to determine the organizational and social structure of a company? Will this tree reflect the structure of the code?

Datasets

Apache Ant Github repository
Apache Ant website
Apache Ant mailing list archive (66000+ emails over 14 years)

Hypotheses

The communication tree will reflect the structure of the organization and the code (eg. A company using agile will consist of small clusters(teams) connected to other teams through team leaders and project managers)
If we can accurately estimate the structure of the organization by looking at the communication, then we can investigate the relationship further and learn more about how/why some communication processes work better than others

How We Will Attack the Problem

Further analyze our list of 66000+ emails
Write code to generate a list of pairs of people and how much they communicate

-Brandon Leech and Jorin Weatherston

DigitalCoffee commented 9 years ago

Devin Corrigall and Andrew Hansen

For project two, we want to continue with the idea of examining the effect of progressive changes and anti-regressive changes on the number of reported bugs over time. Therefore, our question will still be "How does the number of progressive and anti-regressive changes effect the number of bugs reported over time?" Our hypotheses are that the more progressive changes there are, the more bugs will be reported in the following weeks, and that the more anti-regressive changes there are, the less bugs will be reported over the next few months. These are what we seemed to have found in project 1, but our proof and explanation of these were weak. Instead of using our script to get all this information from github, we want to find different ways to obtain the anti-regressive and progressive changes, and make sure it is on a daily basis, instead of monthly. We will probably still use our script for counting bugs, as it seems quite reliable for that. For anti-regressive changes, we would like to use the tool Transit that was created by another team. This will allow us to detect both the day of an anti-regressive change, and the size of it. For features, we are thinking of something that could count the number of lines added on a daily basis, or seeing if we can adapt another group's tool to accomplish this. For datasets, we want to use the same 3 (node, rails and bootstrap), so we can compare our results to what we got in project 1, and see what has improved. Once this is working the way we want it to, we can look at even more repositories on GitHub! :)

gregnr commented 9 years ago

Parker Atkins, Rabjot Aujla, Greg Richardson, Jordan Heemskerk

We would like to continue with our old question with some modifications.

Research problem in the form of a question: Does the volume of unit tests in a project relate to the frequency of bugs?

Datasets to use: We would like to look at data sets that have a longer GitHub history and that track their bugs in a meaningful way. We've found that projects that have a more consistent development cycle (aka more mature) tend to produce better data. Some repositories we may use are: jQuery, AngularJS, Bootstrap.

A couple hypotheses based on some intuition from research or previous lessons learned: We would like to change our hypothesis to: Frequency of bugs will decrease as unit test change. The reason we are changing "increase" to "change" is because we care more about developers working on unit tests than strictly increasing the lines of code in unit tests. Developers working on unit tests, whether its adding or deleting lines of code, should help decrease the number of bugs.

Initial thoughts on how you plan to attack the problem: One of the biggest things we plan to do is refine our method to determine bugs. Right now we just grab all issues from the GitHub repo which can include bugs, but also features, and questions. We would like to change our tool to filter these issues by bugs. We need to decide how we will determine if it's a bug (ie. labelled bug vs keywords), and whether we want to filter bugs on a case by case basis, or the same for all repos. (Case by case basis may produce better results).

We would also like to make our tool work for anybody. We will need to add features that make it work for more general cases. For example, we plan to allow the user to filter their unit tests using a Regex expression instead of just a single folder of files.

Jsyro commented 9 years ago

Jason Syrotuck, Evan Hildebrandt, Keith Rollans

Research problem in the form of a question:

Does the number and size of refactors earlier in a project's life result in significantly and measurably better evolution as the project continues?

Datasets to use:

All projects of a significant size(number of commits), complexity(number of files), and lifespan(more than 4 years) will yield interesting data.

A couple hypotheses based on some intuition from research or previous lessons learned:

Projects that have small refactors continuously will require fewer and smaller refactors later in the growing project, and that the inverse is true.

Initial thoughts on how you plan to attack the problem:

Analyze more code bases, compare size and complexity vs Refactor over time the lifespan to obtain a metric.

davidsjohnson commented 9 years ago

@Hoverbear Yes. I agree and have doubts about the results using such a small project. But thought the internal measure used may be of interest.

ycoady / UVic-Software-Evolution

Lab 7: Project 2 Design Spec #14

Research problem in the form of a question

Datasets

Hypotheses

Initial thoughts on how you plan to attack the problem

How does user feedback effect video game patches and updates

Data

Hypothesis

Initial thoughts

Research problem in the form of a question

Datasets to use

A couple hypotheses based on some intuition from research or previous lessons learned. Should be more specific assertions that will help you answer your research question

Initial thoughts on how you plan to attack the problem

Problem

Datasets

Hypotheses

How We Will Attack the Problem

Research problem in the form of a question:

Datasets to use:

A couple hypotheses based on some intuition from research or previous lessons learned:

Initial thoughts on how you plan to attack the problem: