Open dnr2 opened 11 years ago
Sorry, maybe I missed the point. Is this json the file that will be provided to java -jar groundhog.jar ... -in projects.json
?
If so, will be user responsibility to create this file? Or groundhog will, somehow, create this file? I think, for me, it will be very difficult to create a file like this by hand, which could be an adoption barrier of groundhog.
:question:
Yes @gustavopinto, that was the initial idea. We thought that in the future, or even now, groundhog may require too many parameters to be passed through command line and that it would be tedious for the user to write each single parameter every time they open a new console/terminal. Nevertheless, some terminals limit the size of the command line parameters [1] , (although the original -in option was already solving this). The json file would be a solution for these problems.
Another advantage is that whenever we want to create a new parameter that may take many arguments (like searching for projects by usernames) it would be easy to adapt this json file to the new requirements.
I also think that this json wouldn't be so difficult to create/understand (we could also provide a sample json file). Besides, the user will still be able to use the traditional command line parameters, so anyone that is not familiar with json or the -in input format will still be capable of using groundhog.
But I understand that this format is not so user friendly. So we could change it or, maybe, discard this idea and close this issue. What you guys think? @fernandocastor, @rodrigoalvesvieira
I think this is, at least currently, our best option to specify the search parameters. Your arguments just strengthened this impression, @dnr2. I don't think using json is an obstacle as it would be if we employed XML or a domain-specific language. As for the parameters, we should focus on the ones we already know and create our system so that it is extensible.
hmm.. ok! great arguments! It really change my opinion :wink:
Did you have already started the implementation of this issue?
Not yet @gustavopinto, I normally assign myself to an issue whenever I start implementing it.
oops
I changed a bit the json format in the search
attribute.
{
"forge": "github",
"dest": "C:/groundhog/dest",
"out": "C:/groundhog/metrics",
"datetime": "2012-07-01_12_00",
"nprojects": 30,
"nthreads": 4,
"outputformat": "csv",
"search": {
"projects": ["rails", "bootstrap"],
"username":"gustavopinto"
}
}
But, I'm thinking if projects
and username
are independent or related. For example, are rails and bootstrap projects created by gustavopinto? Or, in this file, I want to download rails and bootsrap and also download all projects created by gustavopinto?
For me, it'd mean: "download the 'rails' and 'bootstrap' projects from the user 'gustavopinto'". Anything else looks very confusing to me.
Ok. Another question: Are projects
and username
required? If the user do not pass the projects attribute, it will download all projects created by 'gustavopinto'? Or it simply does not work?
Moreover, could I pass more than one username?
It would work. Adding both projects
and username
is just a way of narrowing the search (diminishing the possibilities of results). Providing only projects
should download them independently of the username
and providing only username
should return download all projects created by that user, as you mentioned.
Agreed! I think the same way as @rodrigoalvesvieira. Nevertheless, we should consider that the user may want to make different kind of searches at once. e.g : I may want make searches about both (projects
related to: groundhog created by the user
: gustavopinto) AND (projects
related to: bootstrap created by the user
: dnr2). We could provide this functionality by modifying the structure of the JSON (possibly creating an array of searches), but this may become a bit complicated for the user.
I agree with @dnr2 in that we should provide some kind of operator for users to specify both ANDs and ORs. To me, the simplest answer would be to think about username and projects as specifying sets of projects and multiple items would always have an AND semantics for items within a search clause. For example:
"search": { "projects": ["rails", "bootstrap"], "username":"gustavopinto" }
would mean "download projects rails
and bootstrap
created by the user gustavopinto". What if we want to download every project by user gustavopinto
and, at the same time, projects named rails
and bootstrap
? We could specify two different search clauses in the same JSON file:
"search": { "projects": ["rails", "bootstrap"], }
"search": { "username":"gustavopinto" }
This would have an OR semantics, instead of AND and would get a considerably larger number of projects. A relevant question in this case is: what if there are multiple projects named "rails"? Do we download them all? Moreover, what other kinds of options are we interested in supporting? For example, do we need to support a search where the user wants to download only projects FORKED by the user "gustavopinto"? Would that be required to answer any of those RQs?
What do you think of this solution?
The last commit enables groundhog to use AND and OR (only thru json file) semantics. In the future we can add more parameters, such as is_fork
or watchers
.
whoa! :+1:
Cool!! =D
@gustavopinto, Is this issue already implemented?
If the answer is yes, then we should close it...
This issue was labeled as 'continuous'. So, it may change during the groundhog evolution, and thus, we should keep it open.
We started implementing in issue #3 the -in option which allowed the user to specify a json file containing the name of the projects that were going to be downloaded and analysed by Groundhog. Now we are going to improve this -in feature in a way that the json file will provide not only the name of the projects, but also the major arguments that would be passed through command line.
We will have to define the structure of the json file, as well as decide which arguments it will contain. The list of current arguments are (extracted from the Option class):
Therefore I believe that a good structure to the json file would be more or less like:
P.S.: @rodrigoalvesvieira argued that it would be better to omit some arguments such as "nthreads" (only allow it in the command line itself) because this json file should only provide information concerning the projects and the searching, not the details of the computation, but we can discuss his point of view.