sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
20 stars 13 forks source link

Expose parse_gitlog() Git parameters to the project configuration file #194

Open carlosparadis opened 1 year ago

carlosparadis commented 1 year ago

Various flags are passed alongside the --numstat flag to Git. This may alter the behavior of the code. We should also generalize the notebook to account for more than one branch.

See: https://github.com/sailuh/kaiaulu/issues/184#issuecomment-1525478233 for details

carlosparadis commented 1 year ago

A simple step Nicole pointed out for now is parameterize the "master" branch in:

https://github.com/sailuh/kaiaulu/blob/6ff61e558cb2b2a658dc90c93649d41ab3f30022/R/parser.R#L532

carlosparadis commented 1 year ago

Quoting Nicole here:

Regarding the branch flag for perceval in parse_gitlog(): as you said, i realized that this function throws an error if you did not check out the branch you would like to analyse in your local git repo. but in case the desired branch is checked out, it only extracts the information from this branch (e.g. if you specify apache/apr branch "evenset" and checked this out, it only inspects this branch and not more). to make this work, the branch must be added as a flag in the calls with and without regexp filtering (https://github.com/sailuh/kaiaulu/blob/master/R/parser.R#L530-L538).

Git Checkout for Git Log

Of the above my understanding is that:

  1. When we git checkout a repo, normally we just get the master branch.
  2. If we try to do parse_gitlog() on a branch we did not checkout, then the command will throw an error.
  3. We therefore must ensure the user a) git_checkout the right branch, and b) pass the flag to the parse_gitlog of said branch.

In Nicole example, a branch parameter will replace the master hardcoded string, and you would have passed the eventset branch. If you do that, then from the quote the git log table obtained are only of commits made to that branch.

Git Checkout for Dependencies

My original intent on warning for the git_checkout was actually another. Kaiaulu has the parse_gitlog but also a parse_dependencies. The former goes after .git folder with Perceval. The latter goes after the src folder with Depends. The interface to DV8 tool (DV8.R) that analyses architectural flaws requires information originating from both.

What I originally meant was that, beyond the user making sure they correctly specify the branch parameter (item 3 above), they must also call git_checkout in Kaiaulu, otherwise their parsed .git will be about one branch, and the src will be about another branch. This would lead to a silent horrible erronic result, and it is hard to observe.

The DV8 Notebook probably needs to be updated to reflect that, and/or any Notebook associated to this kind of analysis.

Multi-Branch analysis

I think one more thing we need to account for is if the user wants to consider looking at more than one branch. Can we specify more than just one branch on our Perceval call, or two or more calls to parse_gitlog would be required? Considering Perceval does not seem to state from which branch the commit originates, perhaps the multi-call to parse_gitlog() where one column is introduced to the output table with the branch name may be preferred to rbind after.

Other tools

The last consideration on this is if we ever want to replace parse_gitlog() for another tool to parse the git log in the future and how this would look on the project configuration file.

Other Flags

We have to also consider any of the other flags beside the branch:

https://github.com/sailuh/kaiaulu/blob/6ff61e558cb2b2a658dc90c93649d41ab3f30022/R/parser.R#L518-L535

For example, Wolfgang seem to have also deferred figuring the implications of some of these flags like the -C and -M:

https://github.com/siemens/codeface/blob/e6640c931f76e82719982318a5cd6facf1f3df48/codeface/VCS.py#L637-L652