spgroup / groundhog

A framework for crawling GitHub projects and raw data and to extract metrics from them
http://spgroup.github.io/groundhog
GNU General Public License v2.0
15 stars 10 forks source link

How commonplace are project forks? #22

Closed fernandocastor closed 11 years ago

fernandocastor commented 11 years ago

We should be capable of easily extending Groundhog to discover the percentage of projects that have forks and what are the average and median number of forks per project. What is the overall percentage of Github projects that ARE forks?

rodrigoalvesvieira commented 11 years ago

As was implicit in your description of this issue, this can - unfortunately - currently be done for the GitHub forge, the one that has the concept of forks.

A way of implementing this may be by adding methods to the SearchGitHub class that takes a list of Project objects fetched only from GitHub (of course) and then makes the calculation.

Their names could be getMedianForksIndex, getAverageForksIndex* and I suggest that they be static methods. Their usage could be something like this:

// let's get the percentage of projects that have forks 
List<Project> projects = githubSearch.getProjects("groundhog", 1);
SearchGitHub.getAverageForksIndex(projects);

// now let's get the median number of forks per project:
Project forgeProject = new Project(projectName, description, sourceCodeUrl);
SearchGitHub.getMedianForksIndex(forgeProject);

So what do you think about this suggestion?

* These methods would return integers, of course.

fernandocastor commented 11 years ago

Static? Why? Isn't there a concept of a Project? A project should be capable of telling its clients about the forks derived from it.

I didn't understand this: "As was implicit in your description of this issue, this can - unfortunately - currently be done for the GitHub forge, the one that has the concept of forks." Why is it unfortunate that it can currently be done for Github?

rodrigoalvesvieira commented 11 years ago

Sorry @fernandocastor, I meant it was unfortunate that only GitHub supported it. I wasn't clear enough.

Static? Why? Isn't there a concept of a Project? A project should be capable of telling its clients about the forks derived from it.

Yeah, of course. This is much better. Placing it in the search class and making it static is clearly not be best approach.

dnr2 commented 11 years ago

Some (rhetorical?) questions :

1 - Why the getAverageForksIndex should return an integer if the percentage should be represented as a float? Is it rounded? 2 - Aren't Bitbucket's projects capable of having forks?

rodrigoalvesvieira commented 11 years ago

@dnr2:

  1. Very good point. It should return a float indeed.
  2. yes, they're are capable.
rodrigoalvesvieira commented 11 years ago

Sample repo forks API endpoint: https://api.github.com/repos/spgroup/groundhog/forks

rodrigoalvesvieira commented 11 years ago

One question remaining: