josh-project / josh

Just One Single History
https://josh-project.github.io/josh/
MIT License
1.48k stars 55 forks source link

Roundtripping broken when raw commit includes custom metadata (e.g. as GitButler adds) #1377

Closed bjeanes closed 1 month ago

bjeanes commented 2 months ago

In https://github.com/josh-project/josh/issues/596#issuecomment-2298514080, I discovered the reason my SHAs were not lining up, by finding a specific commit where the histories began to diverge (between original repo and imported-then-subpath-cloned repo).

In this case, it seems that the GitButler tool is injecting custom headers into the commit object, which are being elided when JOSH rewrites them:

❯ diff <(git show --pretty=raw 8f49df1dc) <(git show --pretty=raw 34d061bdc)
1c1
< commit 8f49df1dc0acf9389c7aa712847cc8309f01755d
---
> commit 34d061bdca18e243ccc8bfe53333037ce17500f9
6,7d5
< gitbutler-headers-version 2
< gitbutler-change-id baefc019-8ef0-40f2-9f1f-dc92b9984387

These appear to be added by the GitButler tool here https://github.com/gitbutlerapp/gitbutler/blob/6fdcf9fcc85b78fde9b449c914eaa1228c65d868/crates/gitbutler-commit/src/commit_headers.rs#L114-L121, introduced in https://github.com/gitbutlerapp/gitbutler/pull/4025.

Is it possible to support unknown headers or do new headers need to be "learned" case-by-case?

Would it be possible to update JOSH to support these headers? I don't even know what GitButler would do if the gitbutler-change-id was the same UUID for a different commit object, though.

bjeanes commented 1 month ago

Any thoughts on this one? Am I SOL or is this something conceivably fixable? I really want to convert our repos on my team to use JOSH and we have good buy-in for doing the work, but this issue is currently a blocker for doing it. With some pointers, I might be able to do the work to fix it, possibly...

LMG commented 1 month ago

Hi, I think the issue is that git2-rs doesn't seem to allow custom commit headers when creating commits. If you look at this function: https://github.com/josh-project/josh/blob/f0bc12c22cb4f51cefbdd9d4171c8de956bf1f69/josh-core/src/history.rs#L201 you can see that we create the commit using this function: https://docs.rs/git2/latest/git2/struct.Repository.html#method.commit_create_buffer . So it looks like the fix would be somewhat involved, with maybe a patch to this library?

bjeanes commented 1 month ago

Yes, but GitButler also is in Rust and using Git2, so there is fortunately pretty clear precedent to follow. The linked PR makes a lot of changes but only a very small part of that diff is the supporting code to write the commit using Git2 (by using the odb namespace to write the object directly)

I am working on a failing test case for this

bjeanes commented 1 month ago

My last comment is a bit gibberish because I wrote it from phone while distracted in other conversations. My apologies if it didn't make sense.


I agree that the fix is involved and that -- ideally -- a patch upstream might be necessary. However, git2 crate wraps libgit2 C library pretty faithfully here, and libgit2 does not expose the ability for custom headers in its own API either: https://libgit2.org/libgit2/#HEAD/group/commit/git_commit_create_buffer.

So I think an upstream fix is not a viable path forward.

On the other hand, GitButler creates its own buffer struct that supports arbitrary headers and writes it directly via libgit2's Object DB, just as JOSH does.

This is I think the most tactical starting point, but admittedly there may be some hairy edge cases that this doesn't support well enough, in the wild. I'd like to try and see if the existing JOSH tests still pass in this way.

Edit: I've opened a PR along these lines: #1387. A review or thoughts there would be appreciated.

bjeanes commented 1 month ago

Another observation is that JOSH is using String's for message, author, committer, etc. However, Git does not require UTF-8 for these fields, which is why GitButler uses bstr::BString for much of this. I'll continue just with String here, but if switching to BString can be done in a backwards-compatible way, that seems like another important compatibility issue (in the sense of "does JOSH work with my repository/history?")

Edit: That being said, it does appear that much of git2's API is using &str everywhere, so perhaps this point is just a nonstarter anyway.