spgroup / groundhog

A framework for crawling GitHub projects and raw data and to extract metrics from them
http://spgroup.github.io/groundhog
GNU General Public License v2.0
15 stars 10 forks source link

To provide an answer to the question "How many Java projects were active in 2012?" #48

Closed fernandocastor closed 11 years ago

fernandocastor commented 11 years ago

We need to implement and test the features required to use Groundhog to answer the question in the title of the issue. We then have to use it to actually answer the question.

dnr2 commented 11 years ago

Do we have a precise definition of what's an active project on github? There are many ways to measure this but have we already discussed the thresholds and which measures are we going to use?

fernandocastor commented 11 years ago

Nope. What would you suggest?

jesusjackson commented 11 years ago

@fernandocastor i think an active project is one that received a push in an certain amount of time. I really can't define that amount but i would consider 1 year a good one.

gustavopinto commented 11 years ago

We can use the same definition of Richard Sands, which says: an active project is a project that had at least one commit and at least 2 committers in the last 12 months.

rodrigoalvesvieira commented 11 years ago

:+1: and 12 months already seem a lot, very tolerant. Let's stick with that!

jesusjackson commented 11 years ago

I agree on that. I'll start coding. Thanks everyone for the help

jesusjackson commented 11 years ago

I used to write my token in the SearchGitHub.oauthToken, but now is injected and can't do this anymore. Where is the correct line to place my token?

jesusjackson commented 11 years ago

I found in HttpModule in line 19 a token, but is ok to modify it?

gustavopinto commented 11 years ago

This token is mine. Go ahead and change it.

jesusjackson commented 11 years ago

Thanks

jesusjackson commented 11 years ago

@fernandocastor here is a code that answers the question from this issue. After your check i'll close this issue.

rodrigoalvesvieira commented 11 years ago

hmmmm?

jesusjackson commented 11 years ago

I did a code that answers the question and i asked to castor for him to check. ?

rodrigoalvesvieira commented 11 years ago

Oh, I see. Is this it https://github.com/spgroup/groundhog/compare/5e37f7d00f...6bb7d4ff0f?

jesusjackson commented 11 years ago

Yep. I'll put the '@test' in the others after validation, i deleted them so when the 'SearchGitHubTest.java' was running it didn't have to test all the others.

fernandocastor commented 11 years ago

How many?

jesusjackson commented 11 years ago

Of the 20 first found none of them were active. Again i had that problem with the limit of the response. I couldn't get to run more then 20

jesusjackson commented 11 years ago

@fernandocastor i think that time my quota was full, i ran again and it was 3 projects in 20

jesusjackson commented 11 years ago

@fernandocastor Now with the token i found a very different number. 23% of the projects were active. Tonight i'll be able to run it with 2000 projects and see the results. @gustavopinto i removed the .gitignore and i think it resolved the problem

gustavopinto commented 11 years ago

:metal:

jesusjackson commented 11 years ago

Even with the token i can't even run with 500 projects. It ran for 559 seconds before one of the commit requests came back negative.

jesusjackson commented 11 years ago

I made the calculations and my quota os requests is 580. Can't get more. Any ideas?

fernandocastor commented 11 years ago

Make it work through the night, stopping whenever it gets a negative and trying again after 60 minutes. If you keep this pace, by tomorrow morning you'll have analyzed 5000 projects.

gustavopinto commented 11 years ago

is groundhog raising an exception, @pork9? what do you mean with negative requests?

jesusjackson commented 11 years ago

@fernandocastor @gustavopinto i've the message of a empty repository. When i got over that, i did some code to avoid the message of exceeded request limits and wait one hour. Now i'm only trying to get the answer.

jesusjackson commented 11 years ago

I've the answer. 31.380000000000003% of the java projects of 5000 projects were active in 2012.

fernandocastor commented 11 years ago

Very cool. How long did it take to get information about these 5000 thousand projects?

jesusjackson commented 11 years ago

The info of the projects took about 2,5 hours but including the commits it took about 4 hours