spgroup / groundhog

A framework for crawling GitHub projects and raw data and to extract metrics from them
http://spgroup.github.io/groundhog
GNU General Public License v2.0
15 stars 10 forks source link

Download projects by username #17

Closed gustavopinto closed 11 years ago

gustavopinto commented 11 years ago

And if we want to download projects from a given user? something like java -jar groundhog ... -user gustavopinto fernandocastor rodrigoalvesvieira

It may be very useful if we want to study the brazilian open-source community, for example. What language they use? Are they an active community? etc..

What do you guys think about?

fernandocastor commented 11 years ago

Yes. We had considered that in a previous meeting. It can be a command-line parameter or it can be provided through the input file (that ol' JSON file I keep mentioning).

rodrigoalvesvieira commented 11 years ago

I will start this issue later today. A quick look into the GitHub API pointed out the endpoint 1 for this specific search.

fernandocastor commented 11 years ago

Remember that extensibility is key. We want Groundhog to be structured so that it is easy to add support for additional forges, such as SourceForge and Google Code. If what we're doing does not depend on a specific feature only available within a forge, it should work for all the forges we support.

rodrigoalvesvieira commented 11 years ago

Ok. So I'll look at the Google Code API and the SourceForge one to see if I can fetch projects by author in those forges.

fernandocastor commented 11 years ago

It would be nice if we structured our own internal API for accessing forges in two parts: (i) forge-independent (core functionality); and (ii) forge-dependent. The first one would support information retrieval based on functionality that is available on all forges, for example, to download files based on the use of a version control system. Of course that would require forge-specific code, but Groundhog would have its own representation of these forge-independent aspects and they would simply access wrappers for these forge-specific APIs by means of a uniform interface. The second part would be forge-dependent features. This would be necessary to access features such as Gists that are not widely supported. Clear?

rodrigoalvesvieira commented 11 years ago

Sorry, @fernandocastor I admit I'm only reading this right now. I could only notice this message of yours now that I was browsing the issues.

And well, actually, I think that is just the way things work in Groundhog. That's apparently how Flávio developed the tool. I mean, there's the macro part which is more abstract and then things get more specific in the particular forge level, in the forges search and code download and checkout implementations.

Probably I couldn't figure out what you really mean, so I'm going to discuss this topic with you and @gustavopinto and @dnr2 in our next meeting. Hopefully I'll get everything :)

rodrigoalvesvieira commented 11 years ago

And again, sorry, I'll be more attentive in the next issues updates.

rodrigoalvesvieira commented 11 years ago

I made a search about this functionality on our three supported forge and could notice that this can be done in GitHub, Google Code 1 and SourceForge 2.

fernandocastor commented 11 years ago

that's great!

What if we wanted to add Bitbucket? What would need to change?

rodrigoalvesvieira commented 11 years ago

Well @fernandocastor,

basically I think we'd only need to create new Search and Crawl classes for BitBucket and since Git is one of the supported SCMs by BitBucket and we already support it, then there wouldn't be any more problems for downloading the code and performing git checkouts on it.

By the way, I gave a quick glance at BitBucket and saw that they have an API. I don't know how good it is just yet but it already sounds way better than having to crawl the HTML source code of the search results on these forges :p

fernandocastor commented 11 years ago

Take a look at Issue #31.

gustavopinto commented 11 years ago

anything is better than parsing HTML code :shipit:

gustavopinto commented 11 years ago

Sorry :-(

rodrigoalvesvieira commented 11 years ago

Here it is a sample API endpoint for listing all public projects of a specific user:

https://api.github.com/users/rodrigoalvesvieira/repos

dnr2 commented 11 years ago

People, is someone working on this issue? I believe that this feature ins being implemented in #33.

@gustavopinto I saw you implementing this feature on #33, so I will close this issue. If anyone has any problem with this please feel free to reopen it.