initial generators git flow reference cases

Cross-linking discourse: https://discourse.sourcecred.io/t/reference-cases-for-credit-attribution/53/2

Create a prototype of infrastructure to generate these simple graphs.

@decentralion suggested the following reference cases

A reference case containing a normal pattern of development for a small project (mix of ppl creating issues, submitting pulls, etc) and one person who spam-added 100 issues with identical text. Will include with and without spam versions for A/B testing.
Same as above, except one person added a spam comment to the end of every legit issue and PR
A codebase with two people. One works on file A, one works on file B. Do we have a way to query “who has cred n file A” and get the right answer?
A tiny codebase consisting of a few commits from two authors. One author just writes implementation, one author just writes docstrings. How does the cred differ? Do we have a way to filter for “documentation cred” vs “implementation cred”?

The following references cases need to be discussed further is it is not immediately clear how they are reflected in the dependency graph of SourceCred as defined.

A codebase consisting of two coders. One coder adds code, the other comes afterwards and changes the indentation in each file.
A codebase with three people. One works on file A, one works on file B, the third then renames file A to file C. If we ask “who has cred in file C” do we get a coherent answer?

Note that running experiments on how cred distributes over these reference cases will be shared in discourse for discussion purposes and more generally experimenting with how research workflows will bridge github and discourse.

The following references cases need to be discussed further is it is not immediately clear how they are reflected in the dependency graph of SourceCred as defined.

A codebase consisting of two coders. One coder adds code, the other comes afterwards and changes the indentation in each file.

A codebase with three people. One works on file A, one works on file B, the third then renames file A to file C. If we ask “who has cred in file C” do we get a coherent answer?

Yeah, it's impossible to represent these right now. For the second one, we don't have a way to represent file identity. This first one is even harder since it is looking at the specific code within the file, in addition to needing file level identity.

sourcecred / research

initial generators git flow reference cases #22