Open ojarjur opened 7 years ago
Some things that would be important here:
I think one way of achieving this would be to have a flag specifying the author whose comments should be pulled. If no such author is given, then the user is prompted for each new comment to decide if it should be included or excluded.
It might be worthwhile to make the work flow be something like "Pull new comments from the user
My thinking on this issue has evolved recently. My previous suggestion has always felt too cumbersome and limited to me, and now I think I may know a better way to handle this.
The core idea is that there should be a graph defining how separate remote repositories are linked together. The git appraise pull
command would use this graph to pull review metadata not just from the remote repository, but also from all of the linked fork
repositories.
This graph should be stored in the repository itself so that third-parties can pull review metadata from multiple forks at once. That allows the network of developers to continue collaborating on changes even if the "authoritative" remote repository stops being updated.
A repository owner could add a fork by running a command like git appraise add-fork <URL>
. By doing that they would in effect be giving the owner of that other repository permission to request reviews or add comments. If someone else thinks that a repository owner is being too restrictive in adding forks, then they could add those forks to their own repository and encourage others to use their repository as a remote.
To make this work we would need to make sure the following properties hold:
I do not think we need a way to audit incoming requests/comments before adding them, since we now have a way to edit comments without losing the original comment.
However, we do need to gracefully handle multiple, unrelated review requests for the same commit. Right now we track all of the requests, but treat the last one as the "current" request with all previous ones being obsolete. In this new model we need to be able to distinguish between multiple, semantically different reviews on the same commit. That is needed so that one forker cannot hide another forker's review request by shadowing it with their own.
We might do this by adding a parent
field to review requests similar to how we have a parent
field for comments. With that we could distinguish between editing an existing review request versus requesting a separate review based on the same commit.
We should probably also add something like a reviewRemote
field to review requests, to specify which remote holds the updates to the reviewRef
. That field should probably not be changeable by edits to the request.
There are a number of things about this approach that I like:
git appraise pull
command.git appraise add-fork
command.git appraise remove-fork
command to remove a fork if its owner is being abusive.Regarding authenticating metadata; I think we should try to imitate the model git uses for GPG signing commits.
This would mean adding a -S
flag to all of the git appraise
subcommands that generate review metadata (e.g. review requests, comments, etc). That metadata would then be signed prior to being written. We would use the user's GPG key defined in the user.signingkey
config setting to generate this signature. Additionally, the generated git-notes commits would be signed with that same key.
When displaying reviews/comments/etc, we would include information about whether or not the metadata was signed and whether or not the signature could be verified.
We would also want to add a --verify-signatures
flag to the git appraise pull
command which would cause the merge to fail if any of the incoming git-notes commits or review metadata were not verifiably signed. This would correspond to the same, existing flag in the git merge
command. The git appraise pull
command should also take a -S
flag to cause the git-notes merge commit to be signed. If the --verify-signatures
flag is provided then the -S
flag should be implied.
To sign a metadata entry, we could do something like the following:
signature
field to the various JSON schemas.gpgsignature
.Verifying the signature would then involve running that process in reverse.
This would mean that signing changes and checking signatures would be an optional feature that each individual community could decide for itself whether or not to adopt.
It would also mean that this feature would be orthogonal to the rest of the work described in this issue. I will probably split off a separate issue to track it.
For storing the graph of forks, I am now leaning towards a design based on storing the remotes
section from the git config in the repository itself. This would let remote repositories define their respective remotes (which I will call forks
to avoid confusion). Cloners could then use that information to reconstruct the graph as it would be if the remote repository was updated by pulling from all of the forks.
The list of forks would be stored as a git commit. The files in that commit's trees would be named after the names of the forks. The contents of each file would list the fork's URLs and fetch specs. By using a git commit, we can track the history of the list of forks, which seems like a valuable property to have.
When someone runs git appraise pull
against a remote, the tool will first fetch this forks-commit. It will then read the fetch-specs for each fork and convert them into a corresponding fetch spec for the local repository.
If the fork fetch spec ends in something starting with refs/heads/
, then that prefix will be replaced with refs/remotes/<REMOTE>/
. Otherwise, we will add a prefix of the form refs/forks/<REMOTE>/
.
Thus, if your remote is named <REMOTE>
and has a fork named <EXAMPLE>
with a fetch spec of +refs/heads/*:refs/remotes/<EXAMPLE>/*
, then your local repository will fetch from <EXAMPLE>
's URL with a fetch spec of +refs/heads/*:refs/forks/<REMOTE>/refs/remotes/<EXAMPLE>/*
.
We'll then need to make the tool append git notes from all refs of the form refs/remotes/<REMOTE>/refs/notes/*/devtools/...
into the corresponding refs/notes/<REMOTE>/devtools/...
ref prior to merging that ref into the refs/notes/devtools/...
ref.
I really like where it's going. Thanks! And clearly, using PGP as the way to auth the comment is the way to go.
I have a question regarding importing comments from a fork. I'm not sure I agree with the "only import comments from the owner of the fork". I think that external contributors might have interesting point of view. The manually validating each comment for import could be painful. Though, in the CLI, showing a list of commenters on the fork, with a number of comments by contributors could give some idea if you would authorize comments from contributors X or Y when they are not the fork owner.
@dolanor Thanks for chiming in. I appreciate the feedback.
I do want external contributors to be able to comment, and I want them to be able to comment on any review (not just the ones they request).
My thinking about how to do this was that each external contributor should have their own, personal fork for which they are listed as an owner. The tool would then fetch and merge comments from each external contributor, but only from their fork.
For example, consider the scenario where there are two forks, one named "alice" with an owner email of "alice@example.com" and another named "bob" with an owner email of "bob@example.com".
In this scenario, anyone pulling from the forks would fetch all comments from both the fork "alice" and the fork "bob". However, they would only merge comments from the "alice" fork if the author was "alice@example.com", and they would only merge comments from the "bob" fork if the author was "bob@example.com".
This way, we validate that comments are coming from the fork that they are supposed to come from, and we can do that validation automatically (without manual user intervention).
Would that address your concern about letting external contributors comment, or is there another scenario that is missing?
Yes it would. One question though: if alice forked your repo and bob commented on alice repo, could it be possible to automatically discover and fetch bob's comments, even if you only knew alice's repo in the first place?
@dolanor The design (for which I have an initial implementation in this branch) does not rely on the user knowing all the forks.
Instead, the forks are stored in a special ref in the repository and can be pushed/pulled with automatic merging.
So, for example, if I have a repository at https://github.com/ojarjur/example, and Alice creates a fork of that repository at https://git.example.com/alice/example, then I would first have to run:
git appraise fork add -o alice@example.com alice https://git.example.com/alice/example
git appraise push
Then any third party could clone my repo at https://github.com/ojarjur/example, and run git appraise pull
to see reviews and comments pushed by Alice to her repository. This happens without that third-party needing to know anything about Alice's fork; the tool automatically pulls down everything it needs to know from the main repo.
If Alice gives Bob permission to push to her repository, then any reviews or comments he pushes as "bob@example.com" will be ignored because he was not listed as one of the owners of that fork.
Instead, Bob would need to set up his own repository (let's say he uses https://git.example.com/bob/example), and then convince me to add his fork to the list stored in the main repo.
However, once I add his fork, anyone will then start picking up his reviews and comments without needing to do any manual steps aside from running git appraise pull
against my repo.
Ok, I see. So the right to comment would not be transitive from one repo to another. Each owner of the repo needs to explicitely add a fork. But once that fork has been added, everybody fetching from the original repo would also get the fork's comments. Is that right?
In the case of
If ojarjur adds alice and bob's repos as forks, then anybody fetching reviews from upstream would get both alice's and bob's comments, right?
What if alice adds bob's repo to her own. Somebody fetching reviews from alice's repo would also get bob's comments, right?
But now, if ojarjur adds alice's fork, and alice adds bob's fork, does ojarjur gets bob's comment since alice added bob's fork? And in that case, would users of ojarjur project would also get bob's comments by fetching the review comments?
Ok, I see. So the right to comment would not be transitive from one repo to another. Each owner of the repo needs to explicitely add a fork. But once that fork has been added, everybody fetching from the original repo would also get the fork's comments. Is that right?
Yes.
In the case of
- ojarjur has github.com/ojarjur/example (let's call it upstream)
- alice has a fork of upstream in git.example.com/alice/example
- bob has a fork of upstream in git.example.com/bob/example
If ojarjur adds alice and bob's repos as forks, then anybody fetching reviews from upstream would get both alice's and bob's comments, right?
Yes
What if alice adds bob's repo to her own. Somebody fetching reviews from alice's repo would also get bob's comments, right?
Yes, but to clarify, in order for this to happen that person would have to:
git appraise pull aliceexample
.Assuming they did both steps, they will see bob's comments that he pushed to https://git.example.com/bob/example
But now, if ojarjur adds alice's fork, and alice adds bob's fork, does ojarjur gets bob's comment since alice added bob's fork?
No. The fetching from forks is also not transitive. So, if I pull from https://github.com/ojarjur/example, then I fetch from all of the forks listed there, but I do not pull forks of those forks.
And in that case, would users of ojarjur project would also get bob's comments by fetching the review comments?
No. The tree of forks is basically flat.
I wonder if the extra step required (because of no transitivity) would affect the amount of users to the repository. In a way, it looks like the OpenPGP web of trust to me, which I kind of like. But also, the PGP WoT never got so big. And I think UX and usability has been the biggest hinderance to the project. And in another way, most of use of git nowadays are centralized in 1 repo. As long as this repo adds all the forks, we're good. But then, it would make git-appraise centralized again.
Apart from that, I think the rest is good.
In a way, it looks like the OpenPGP web of trust to me, which I kind of like.
I agree, there are similarities.
And in another way, most of use of git nowadays are centralized in 1 repo. As long as this repo adds all the forks, we're good.
Yes. I actually hope there will eventually be some sort of a web UI that offers self-service support for adding forks.
But then, it would make git-appraise centralized again.
I don't think this would go all the way back to being centralized.
I think, at the most extreme, we would just wind up with a bunch of federated services. A repository may have just one host, but its forks could be on different hosts as long as the original host does not try to lock people in.
Apart from that, I think the rest is good.
Thanks for chiming in and offering your thoughts and feedback; I really appreciate that.
:wave: Hi all!
Is this
To sign a metadata entry, we could do something like the following:
- Add a signature field to the various JSON schemas.
- Initially fill in that field with a placeholder value like gpgsignature.
- Serialize the JSON object as a single line string.
- Generate a detached GPG signature of this string.
- Encode the generated signature as a single-line JSON string.
- Replace the placeholder value with the encoded signature.
- Write out this final value as the actual git-notes entry.
Still a thing which y'all are interested in adding to git-appraise
?
@pittma Yes, it is, and contributions for it would definitely be welcome.
It is orthogonal to the rest of this work, so I should really split out a separate issue to track it.
I will do that right now.
@pittma I created https://github.com/google/git-appraise/issues/89 to track GPG signing separately.
This is meant for situations where an outside contributor requests a pull from their repository to an upstream repository.
In that scenario, the outside contributor can pull reviews from the upstream repository, and can push their review metadata to their repository. However, the maintainers of the upstream repository probably do not want to pull review metadata for all reviews from that outside contributor's repository, but do want to pull review metadata for that one request.
I imagine a scenario like the following:
upstream
) is hosted somewhere.fork
).git request-pull
or something like a GitHub pull request) from their fork into the upstream repository.git appraise pull <upstream>
to fetch the review request and comments from the upstream repository. They then respond to the comments and rungit appraise push <fork>
to push their comments to their fork.git appraise pull --only-comments <review> <fork>
to pull just the comments for that review from the fork, and then they rungit appraise push <upstream>
to push the combined review metadata to the upstream repository.Since the upstream only really cares about the comments from the contributor (the review requests will be different because the review refs will be different, and the maintainers may want to abandon the review), I'm thinking this can all be supported by adding a flag to the
git appraise pull
subcommand that tells it to only pull the notes from therefs/notes/devtools/discuss
ref, and to only merge in the notes for a specified review.