spgroup / groundhog

A framework for crawling GitHub projects and raw data and to extract metrics from them
http://spgroup.github.io/groundhog
GNU General Public License v2.0
15 stars 10 forks source link

-in option to search from file #3

Closed rodrigoalvesvieira closed 11 years ago

rodrigoalvesvieira commented 11 years ago

We must implement in the Options class the option from searching projects from the content of an informed file so that the names of the project can be passed in "bulk" rather than individually. Take this as an example:

Currently we do:

java -jar groundhog.jar -forge github -out metrics phonegap-facebook-plugin facebook-android-sdk spring-social-facebook

This will look for projects whose name match either phonegap-facebook-plugin, or facebook-android-sdk or spring-social-facebook.

But this approach can be tedious and hard to repeat as the number of projects gets larger.

So, the -in option would make our lives easier by making Groundhog read the search terms from a JSON file, like this:

this file is called projects.txt

phonegap-facebook-plugin
facebook-android-sdk
spring-social-facebook

and then:

java -jar groundhog.jar -forge github -out metrics -in projects.txt
gustavopinto commented 11 years ago

It is an interesting feature, but, IMHO, we should first fix #9 before start this one.

rodrigoalvesvieira commented 11 years ago

Yes, agreed. I guess we should discuss the migration to JCommander ASAP. Will talk to @fernandocastor this week and see.

rodrigoalvesvieira commented 11 years ago

Development/debugging pro-tip for the SearchGitHub class:

log the searchUrl variable in the getProjects() method, then you can more easily grab an API url and see the content for yourself, if needed. Example (search term: github api): https://api.github.com/legacy/repos/search/github+api?start_page=1&language=java

another example (search term: cepfacil): https://api.github.com/legacy/repos/search/cepfacil?start_page=1&language=java

Screen Shot 2013-04-18 at 12 18 56 AM

gustavopinto commented 11 years ago

And if we parse regular expressions in the project name?

Such as java -jar groundhog ... facebook* twitter*

rodrigoalvesvieira commented 11 years ago

hmm, I don't understand :(

gustavopinto commented 11 years ago

I mean, and if the user want to download all facebook related projects. Wouldn't it be nice if I could pass facebook* as parameter and then it download all projects that name starts with facebook?

rodrigoalvesvieira commented 11 years ago

Ah, yeah. Got it.

dnr2 commented 11 years ago

Using regular expressions to search for projects is also related to issue #19 because you could search for all projects in a repository using the regular expression '*'.

dnr2 commented 11 years ago

People, I believe that it would be better if the JSON file provided all the command line arguments instead of passing only the names of the projects. @fernandocastor talked about this feature in #17, so I propose that we close this issue, that talks about passing project names through a file, and create another issue about passing all command line arguments. What do you think?

fernandocastor commented 11 years ago

Agreed.

dnr2 commented 11 years ago

As I said this issue will be replaced by #33 so I'm closing this one.