Difficult-to-recove-from crash after `jj duplicate`.

ilyagr commented 1 year ago

I've had a problem similar to https://github.com/martinvonz/jj/issues/635, except this time I don't quite know how to recover. None of jj log, jj op log worked. I found that jj --at-operation @- op log works, but I don't know how to use that. Here is the repository state: jj-repo.tar.gz.

I do remember using jj duplicate at some point. I didn't do any explicit rebases, but I did use jj edit and tried to change a file.

I think jj should provide some way to recover from this.

An example of the error:

$ RUST_BACKTRACE=1 jj op log
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Other("Git commit '18a7eceb66e7d1eb757bc34ff018a9987fe4735d' already exists with different associated non-Git meta-data")', lib/src/store.rs:87:60
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/core/src/result.rs:1785:5
   3: jujutsu_lib::store::Store::write_commit
   4: jujutsu_lib::commit_builder::CommitBuilder::write_to_repo
   5: jujutsu_lib::rewrite::rebase_commit
   6: jujutsu_lib::rewrite::DescendantRebaser::rebase_next
   7: jujutsu_lib::rewrite::DescendantRebaser::rebase_all
   8: jujutsu_lib::repo::MutableRepo::rebase_descendants
   9: jujutsu::cli_util::WorkspaceCommandHelper::snapshot
  10: jujutsu::cli_util::CommandHelper::workspace_helper
  11: jujutsu::commands::run_command
  12: jj::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

martinvonz commented 1 year ago

Does jj --no-commit-working-copy undo get you into a reasonable state for now?

ilyagr commented 1 year ago

It wasn't as easy as that, but your advice helped very much. jj --no-commit-working-copy op log works, but jj --no-commit-working-copy op undo wasn't sufficient. After trying a few options, jj --no-commit-working-copy op restore ee5fb restored me to a reasonable state.

Thanks!

ilyagr commented 1 year ago

Actually, the state it restored me to was still a bit problematic. The op restore --no-commit-working-copy command messed up the commit active at that operation log point. The contents of that commit was replaced with the contents of my working directory before I ran jj --no-commit-working-copy op restore ee5fb.

This is certainly less bad, though, as it can be recovered from using jj obslog.

martinvonz commented 1 year ago

Makes sense. --no-commit-working-copy actually means both that the working copy should not be snapshotted at the beginning of the operation and that it should not be updated at the end of the operation. That means that you should be able to really recover to that state by running jj op restore ee5fb (i.e. the same command again but without the option to not commit the working copy).

The root cause is #27, so that's what we should fix.

ilyagr commented 1 year ago

That means that you should be able to really recover to that state by running jj op restore ee5fb (i.e. the same command again but without the option to not commit the working copy).

Ah, that's a good idea, I didn't think of that. So, for future readers, the solution seems to be:

Find an operation id to restore to with jj --no-commit-working-copy op log
jj --no-commit-working-copy op restore ee5fb to get jj in a sane state.
The now-current commit's contents will be replaced by whatever is currently your working copy. If this is not desirable, try jj op restore ee5fb (which will lose whatever is currently in your working copy).

In my case, the reason this became an issue was that I used some git commands before step 1.

ilyagr commented 1 year ago

Another guise of this problem: with this state of the same repository, everything seems fine. However, running a rebase crashes consistently, and --no-commit-working-copy does not help with it.

$ RUST_BACKTRACE=1 jj --no-commit-working-copy rebase -b mergetool2 -d main
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Other("Git commit 'e3ec8474b2e9c3051130fabe0e5f98629c88e8d0' already exists with different associated non-Git meta-data")', lib/src/store.rs:87:60
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/a31705549837622a3c3c27e98c9d8a6d820a984b/library/core/src/result.rs:1785:5
   3: jujutsu_lib::store::Store::write_commit
   4: jujutsu_lib::commit_builder::CommitBuilder::write_to_repo
   5: jujutsu_lib::rewrite::rebase_commit
   6: jujutsu_lib::rewrite::DescendantRebaser::rebase_next
   7: jujutsu_lib::rewrite::DescendantRebaser::rebase_all
   8: jujutsu_lib::repo::MutableRepo::rebase_descendants
   9: jujutsu::commands::run_command
  10: jj::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

martinvonz commented 1 year ago

The crash happens whenever two commits that initially differed in their change ID and commit timestamp (and only those two properties) are rewritten, because then it's very likely that their commit timestamps match. The rebase you mention presumably includes two such commits. In the initial comment in this issue, the working-copy commit had two such descendants, so whenever you modified files in the working copy and then ran any command, it would result in the crash. That's why --no-commit-working-copy helped in that case (it stops jj from amending the working-copy commit and rewriting the descendants).

ilyagr commented 1 year ago

Perhaps a workaround would be to have jj duplicate increase the timestamp of the commit by 1 (not sure if it will be 1 second or 1 millisecond)?

martinvonz commented 1 year ago

Perhaps a workaround would be to have jj duplicate increase the timestamp of the commit by 1 (not sure if it will be 1 second or 1 millisecond)?

Let's say you have this history:

B
|
A

You then run jj duplicate B and get this:

B2
|
| B
|/
A

Bumping the commit timestamp would help if you ran jj duplicate B within a second of rewriting B. That case is quite likely because any change in the working copy would rewrite B if it were the working-copy commit. However, it wouldn't help with the problem you reported in this issue, because the issue here is when A gets rewritten. When A gets rewritten (e.g. by rebasing it or by editing it in the working copy), both B and B2 will also get rewritten, which almost definitely will make their commit timestamps match.

ilyagr commented 1 year ago

Here's another possible workaround: when this problem occurs, jj can create an extra commit with a random UUID in the description, no content, and the same parents as the problematic commit. It can then rebase the problematic commit on top of the new commit.

I'm not sure if this should be automatic, or if jj should give the user a command to run to make it happen. In the latter case, the user could choose which of the equivalent commits to do this to.

ilyagr commented 1 year ago

I'm becoming more convince that the plan I outlined in the previous comment is a good idea. Any objections?

I would create a new type of BackendError that would be returned in this snippet of GitBackend::write_commit instead of BackendError::Other:

https://github.com/martinvonz/jj/blob/fb6181f212947fa17b2ebb317d07dacd30b68430/lib/src/git_backend.rs#L500-L507

I would catch this new type in MutableRepo::write_commit:

https://github.com/martinvonz/jj/blob/fb6181f212947fa17b2ebb317d07dacd30b68430/lib/src/repo.rs#L638-L642

In this special case, after failing to write one commit, MuableRepo::write_commit would try to write two commits instead:

a newly created empty commit with a long and detailed description explaining what's going on and what the user should do about it. It would also include a full change id in the description to make it unique. Its parents will be the parents of the commit we failed to write.

(The description would tell the user to modify the commit with change id XXX and then squash it)
the original commit we were trying to write, rebased on top of that.

WDYT about this plan, @martinvonz ? Will it confuse the Git backend to write a commit with the same tree that a commit it failed to write had?

martinvonz commented 1 year ago

WDYT about this plan, @martinvonz ?

I like the solution of having the Git backend bump the timestamp until it succeeds instead. Most users wouldn't even notice, and the changes can be done within the Git backend, so I think it's simpler for both the user and for us. What do you think about that? I know we talked about this on Discord some time back.

Will it confuse the Git backend to write a commit with the same tree that a commit it failed to write had?

That wouldn't be a problem. The Git backend doesn't care about relationships between objects at all, I think.

ilyagr commented 1 year ago

Oh, I thought we rejected that option for some reason. I think I'm OK with it.

It is theoretically possible it will cause somebody some subtle and difficult-to-understand problem if they rely on the timestamps for something. But it does seem unlikely.

ilyagr commented 1 year ago

To be extra-safe, we could have the git backend print an ugly message ("Changed the timestamp on commit ... to avoid two commits with the same id").

martinvonz commented 1 year ago

To be extra-safe, we could have the git backend print an ugly message ("Changed the timestamp on commit ... to avoid two commits with the same id").

I suspect it might be more annoying than it's worth. I almost never run into this problem, but I think you use jj duplicate a lot more, so you'll be a better judge of the useful:annoying ratio :)

martinvonz / jj

Difficult-to-recove-from crash after `jj duplicate`. #694