smontanari / code-forensics

A toolset for code analysis and report visualisation
383 stars 45 forks source link

Multi repo support #70

Open a-bendoraitis opened 2 years ago

a-bendoraitis commented 2 years ago

Right now rootPath configuration expects single path where git repository is located. I have multiple git repositories and it would be very useful to gather data from all of them.

Maybe I overlooked some of the configuration options and it is possible now? I can also contribute with a pull request and turn rootPath gulpfile option into an array, but thats a breaking change and I might just fork it for myself

smontanari commented 2 years ago

I can try to interpret your request in two ways:

  1. You're after some form of parallelisation, i.e. you want to run the same analysis in parallel on multiple repos to speed up your investigative process.
  2. You're after analysis that perform data mining considering commits from multiple code bases.

Option 1 is just a matter of parallel execution of commands that can be achieved in a variety of ways without having to necessarily change code-forensics. Option 2 is a completely different beast. Supporting analyses across multiple repositories is an ambitious goal, but unfortunately it might require more than just accepting an array of root paths. The idea would be to apply the same algorithms of the existing analyses across commit data gathered from multiple projects, to infer potential issues caused by hidden couplings between the corresponding code bases. The problem though is that all the current analyses are based on the data collected by running git log commands, and, as far as I know, you cannot run git log across multiple repos at once. So ideally the work should require collecting and somehow merging commit information from different repositories. Something to think about but not as straightforward to implement.

a-bendoraitis commented 2 years ago

I'm looking at the second way. For my use case, I don't really care about hidden couplings, just - run analysis on different repos and present all the data in single report, for example in hot spots - all repos could have their own blob, side to side. That's why I'm thinking about rootPaths array, I don't need anything too complicated.

I tried merging my repos into one, preserving all of the logs, but I couldn't make it to work

smontanari commented 2 years ago

present all the data in single report

That is not possible for the same reasons I described above, i.e. the reports are pretty much a data mining exercise over the information contained in git log outputs, and such outputs contain data that is relative to one repository only.

This sort of feature has been on my mind for some time, because I do understand its potential benefit, especially as we move towards more distributed codebases. However, it'd require my full attention to assess its feasibility and necessary code changes, and unfortunately I don't have much time now.

I'm not going to close this issue for the moment, but only so I can see it here as a reminder of a desirable feature.