reposense / RepoSense

Contribution analysis tool for Git repositories
https://reposense.org
MIT License
245 stars 154 forks source link

Assigning lines to most contributing author #2198

Open SkyBlaise99 opened 5 months ago

SkyBlaise99 commented 5 months ago

What feature(s) would you like to see in RepoSense

With reference to issue #944, authors are now assigned full or partial credit based on the amount contributed by him/her. However, assigning partial credit to the last author is not very meaningful as compared to assigning the line to another author with higher contribution.

If possible, describe the solution

We check the contribution of all authors on a line and assign the line to the author with the highest contribution. Amount of contribution is measured by the originality score (normalized edit distance), which is essentially the amount of changes.

The final result will be that all authors will be assigned full credit, since they contributed the most. In that case, partial credit would be given in the case of annotated author claiming the credit of the most contributing author and annotated author != most contributing author (this is also more explainable since we know that the annotated author is not the one that contributed the most now).

Additional context

I have made an attempt on this branch. The algorithm can be exploited by repeatedly delete and add back the same lines of codes to inflate his contribution value. This can be potentially fixed by taking the difference of the first and final commits made by him so the intermediate commits of 'delete & add' would not be counted in (just an idea I havn't test it out yet).

Moving forward

By assigning the lines to the most contributing author, we will have a map consisting of all the contribution value of the authors. It is possible to take a step further and assign all the authors a percentage contribution based on that. A drawback to this would be that the frontend requires quite a bit of change to incoporate assigning lines to multiple authors and displaying them visually in the report, both individually and when the repo groups are merged.

SkyBlaise99 commented 5 months ago

Probably suitable as a FYP project, but anyone is welcomed to try it out.