greenrd / topgit

TopGit is now maintained at
https://github.com/mackyle/topgit
Other
95 stars 26 forks source link

[design] .topmsg / .topdeps pollute commit history #38

Open aspiers opened 9 years ago

aspiers commented 9 years ago

.topmsg and .topdeps represent SCM meta-data, and so I believe it's a fundamental design mistake to include them in the commit history:

Having said that, I can hazard a guess as to why this design decision was made: in order to adhere to the third stated tenet:

(iii) TopGit is specifically designed to work in a distributed environment. You can have several instances of TopGit-aware repositories and smoothly keep them all up-to-date and transfer your changes between them.

This is clearly a worthy goal, but I think the current implementation is the wrong way of going about it. (However the original author's comments on this decision are encouraging.) In fact topgit already stores other meta-data such as refs/top-bases/* outside the commit history. There are alternate approaches worth considering, e.g.

None of these will get fully and automatically transferred across remotes during using of normal git push / pull / fetch, but it wouldn't take much effort to fill in the gaps.

aspiers commented 9 years ago

Now that I think about it more, it seems to me that adding a tag to the base of the topic branch seems like the cleanest solution. This would solve #39 for free, since the tag could be lightweight or annotated. Of course there would need to be some way of tying the base tag to the topic branch, but this could easily be done by adopting a naming standard for the tag, just like topgit already does with refs/top-bases.

greenrd commented 9 years ago

One example is that it prevents creation of a github pull request directly from a t/foo topic branch.

The idea is that you would use tg export to create a cleaned-up branch for one or more topic branches, each of which would become a single commit in the cleaned-up branch, and then create a pull request for that.

aspiers commented 9 years ago

@greenrd But what if I don't want to squash my topic into a single commit? That's generally a bad idea if the topic contains multiple logical changes. And it also requires an extra step, which is not ideal for the UX.

imz commented 9 years ago

IMO, it's a very sensible and useful suggestion.

  1. If the meta-information was out of the trees, external Git-based tools that are not aware of topgit would be happy. For example, the maintainers of ALT distro work on packages in Git-based gear-repos. One workflow with gear is to use Git-branches for generating patches which are put into .src.rpm. I'd like to emphasize that here the patches are not a meaningful independent thing which is intentionally prepared and sent upstream, but rather a thing internal to the workflow. Package maintainers would collaborate on branches instead (and use TopGit to communicate with upstream, of course; here the repo represents work-in-progress, not necessarily prepared for sending upstream). But due to the extra meta-stuff from TopGit, these branches can't be cleanly fed into gear.
  2. It must be simpler to rewrite the descriptions or update dependencies (say, after a renaming), if they were external to the trees, Then rewriting a description or renaming a branch wouldn't cause rewriting the commit history, which is nice. (Not to say about how cumbersome it is to rewrite multiple commits to update all meta-files, but how easy it is to update the info at one place.)
aspiers commented 9 years ago

@imz In reply to your points:

  1. Fantastic - thanks for the info which seems to prove that my thinking is along the right lines!
  2. If we used tags, it would be trivial to rewrite the topic branch description simply by deleting and recreating the annotated tag for that topic's base commit.

There's still the question of where to track a topic's dependencies though. I wonder if it shouldn't be inside the text of the annotated tag because

  1. clean separation of the description (unstructured text) from the dependency list (structured list of strings) will make the implementation programmatically cleaner, and
  2. it should probably be possible to change the dependencies and description independently of each other ... or should it? How often is the topic description likely to change without the dependencies changing, or vice-versa? Having said that, it probably makes sense to store a topic's dependencies and description via the same mechanism for consistency, so that there is a single unified workflow for reading / writing / sharing them. And maybe the practical benefits in simplicity of storing in the same object outweigh the architectural uncleanliness?

I've had some further thoughts on the possible locations for storing this metadata:

Currently I'm leaning towards tags. I'm still undecided whether it should be one or two tags per topic though. It could be one tag by treating any line which starts with the magic string Dependency: as a dependency declaration. This would have the advantage that you could find out everything you need to know about a topic's metadata (i.e. description and dependencies) simply by viewing the contents of its base tag.

Thoughts welcome.

aspiers commented 9 years ago

I am working on this issue this week as a Hack Week project.

I'm still undecided whether it should be one or two tags per topic though.

I think one tag per topic keeps it clean and simple, and it seems to me advantageous to be able to simultaneously view a topic's description and its dependencies (if both exist). For example, reading the description might make you realise that the dependency list is wrong. And I can't really see any disadvantages other than having to extract the dependencies from the tag text.

aspiers commented 9 years ago

Another advantage of keeping the description and dependencies coupled together in the same chunk of metadata is that they can't accidentally get separated, e.g. when transferring between remotes, or if something goes wrong in the code. So it's clearly more robust.

However there is a disadvantage to using tags, which is that normal tags occupy a single namespace (refs/tags/*) which spans all remotes, so fetching tags from a remote would automatically pollute the local repo with remote topic branch metadata. I think it would be better to keep this metadata isolated per remote, just like normal branches are, in order to maintain the third goal stated in this issue's original description above. So currently I'm experimenting with the idea of manually creating tag objects via git mktag, and then associating them with the refs/topic-bases/* namespace via git update-ref (maybe better to use refs/topic-bases to distinguish new-style topic base tags from old-style refs/top-bases/* refs). This seems to work so far. It needs a tweak to gitk to help it understand tags which exist outside the refs/tags/* namespace, but this should be easy.

Whilst thinking about all this, further comparisons of the status quo (which relies on .topdeps and .topmsg at the beginning of the topic branch) with a tag-based approach occurred to me. Firstly, both are slightly vulnerable to loss of metadata. Currently you could lose .topdeps and .topmsg e.g. via a rebase -i which accidentally drops the commit. Similarly tags could accidentally be deleted, although this is harder if they live in refs/topic-bases/*.

A disadvantage of the status quo is that rewriting the metadata requires rebasing the whole branch, which changes all the SHA1s. This reinforces my believe that metadata should be kept separate from content! I'll update the issue description to include this.

greenrd commented 9 years ago

The idea is that changes to the metadata are tracked in the git history just like any other changes. How would you merge changes to the metadata if not?

aspiers commented 9 years ago

One immediate observation is that tracking changes to metadata is supposed to be handled by the reflog, not the normal log.

But the key question here is, what are the use cases for tracking changes to the topic metadata, or being able to merge changes to the metadata?

For the latter, I can imagine that if two or more developers are collaborating on a topic and are separately editing the same .topmsg, it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit. After all, it's not even currently possible to merge commit messages, and without that, the ability to merge topic descriptions is somewhat dubious. The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Please let me know if I'm missing something. Thanks!

greenrd commented 9 years ago

it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit.

But it would still be a loss of functionality compared to the previous version of topgit. (Not to mention that you would need some way of converting topgit repositories to the new format.)

The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually

But it makes sure it happens.

, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Well that is what a merge semantically means - if you pull in someone else's changes and they have added a new dependency, that should be added to your topic branch as well. That's how TopGit's distributed nature works. We only support adding dependencies at the moment, so a merge would only add new dependencies, and therefore we don't officially support rejecting new dependencies (except by using git reset to undo the merge). I think there should be a post-commit hook to run tg update etc. if .topdeps has been changed.

aspiers commented 9 years ago

On Wed, Apr 15, 2015 at 03:59:28AM -0700, Robin Green wrote:

it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit.

But it would still be a loss of functionality compared to the previous version of topgit.

Not really, because I'm going to keep it backwards-compatible via a new topmsg.metadata config variable :-) If it's set to files (the default) then nothing changes. If it's set to tags then the metadata is stored as tags instead. In this case yes you will lose a small amount of functionality (tracking / merging metadata), but on the flip side, all the problems listed above will vanish. I suspect that for a lot of people, the tags mode will be preferable.

(Not to mention that you would need some way of converting topgit repositories to the new format.)

Yes, I am hoping to add that later on. Shouldn't be hard.

The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually

But it makes sure it happens.

, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Well that is what a merge semantically means - if you pull in someone else's changes and they have added a new dependency, that should be added to your topic branch as well.

Understood, but what if that dependency topic is present but different in both remotes? Then you've introduced a local dependency on the wrong thing. The only way to solve this would be to recursively merge all topics in the dependency tree, and that sounds pretty hard to handle cleanly, e.g. how would you rollback the whole tree if something went wrong during the middle of the process?

That's how TopGit's distributed nature works.

Sure - I definitely want to avoid breaking the existing support for distributed setup. That's why I'm experimenting with tags in the refs/topic-bases/* namespace, rather than in refs/tags/*.

We only support adding dependencies at the moment, so a merge would only add new dependencies, and therefore we don't officially support rejecting new dependencies (except by using git reset to undo the merge).

Thanks for the info - I hadn't looked at tg depend before. Should be pretty easy to extend this (and even easier when it only requires rewriting a tag).

I think there should be a post-commit hook to run tg update etc. if .topdeps has been changed.

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

greenrd commented 9 years ago

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

Because that is what tg depend add does. (Actually, it also checks for dependency loops, which a post-commit should probably also do.) If a new dependency comes in via a merge, I think it should behave the same as if it comes in from tg add. Admittedly, it would trigger tg update to merge in the latest changes from all the dependencies, recursively, not just the new ones - but that is what tg depend add does, too, so the "defect" - if it is a defect - is already there.

Another possible counterargument is that if it's a fast-forward (or indeed a commit resulting from running tg depend add!), the relevant tg update for the new dependencies has already been done. So maybe my proposed post-commit hook should only run if HEAD is a merge commit.

aspiers commented 9 years ago

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

Because that is what tg depend add does.

Ahh, OK. But in that case, there is no risk of the user being surprised by tg update being triggered, because they've invoked a tg command. In contrast, the idea of things happening automagically via post-commit hooks makes me uncomfortable, because a) it violates the Principle of Least Surprise, and b) there may be use cases where the user doesn't want this to happen.

(Actually, it also checks for dependency loops, which a post-commit should probably also do.) If a new dependency comes in via a merge, I think it should behave the same as if it comes in from tg add.

Here you are referring to a normal git merge of (say) a remote fork of the topic branch, right? And BTW I assume you meant tg depend add not tg add? Again, I think it might be safer if this kind of behaviour only happened via tg subcommands, e.g. tg merge which was a topic-aware wrapper for git merge.

Admittedly, it would trigger tg update to merge in the latest changes from all the dependencies, recursively, not just the new ones - but that is what tg depend add does, too, so the "defect" - if it is a defect - is already there.

Another possible counterargument is that if it's a fast-forward (or indeed a commit resulting from running tg depend add!), the relevant tg update for the new dependencies has already been done. So maybe my proposed post-commit hook should only run if HEAD is a merge commit.

Sounds reasonable. Thanks for all the info - this is greatly helping my understanding, even though we have gone off on quite a tangent ;-)

Going back to the focus of this issue, I am pretty sure that there are many developers (myself included) who would find topgit very useful purely for managing local-only topic branches, without any need to publish them and merge changes from other forks of the same topic branch. And in this case, the more lightweight approach offered by using tags instead of .topmsg / .topdeps offers the substantial advantages, as explained in this issue's description. So as long as my changes retain backwards-compatibility via a dual-mode approach (ideally with a way to easily convert between the two), I hope you will consider merging them once they're ready.

hrj commented 9 years ago

@aspiers

I am not sure about the technical details of this design, but the use-case you describe matches mine:

I am pretty sure that there are many developers (myself included) who would find topgit very useful purely for managing local-only topic branches, without any need to publish them and merge changes from other forks of the same topic branch.

I just want to manage local patch sets. I and my team haven't yet reached the point where we would find remote collaboration on patch-sets useful.

If this design helps with this use-case, I am willing to try it out. Do you have an implementation somewhere?

aspiers commented 8 years ago

@hrj Sorry for the very slow reply! My work will always be found at https://github.com/aspiers/topgit/branches but I can't remember what state I last left it in. I am hoping to return to this at some point soon though.