mhagger / git-imerge

Incremental merge for git
GNU General Public License v2.0
2.7k stars 125 forks source link

finish/simplify crashes with unicode error #114

Open pmj opened 8 years ago

pmj commented 8 years ago

This is my first time using git-imerge, so apologies if this is user error.

I tried to use git-imerge to rebase a very out of date feature branch to a more recent upstream commit on the edk2 repository. Something along the lines of:

git checkout feature-branch
git imerge rebase <upstream commit hash>

After a bunch of manual merge interventions (the reason I'm trying imerge in the first place), the operation completed, but now the repo is a bit of a mess with lots of intermediate merge refs, and I'm left in a detached HEAD situation. I assumed this is expected (?), and tried running

git imerge finish

This unfortunately crashes with:

$ git imerge finish  
Traceback (most recent call last):
  File "/usr/local/bin/git-imerge", line 3926, in <module>
    main(sys.argv[1:])
  File "/usr/local/bin/git-imerge", line 3915, in main
    cmd_finish(parser, options)
  File "/usr/local/bin/git-imerge", line 3537, in cmd_finish
    merge_state.simplify(refname, force=options.force)
  File "/usr/local/bin/git-imerge", line 2703, in simplify
    self.simplify_to_rebase(refname, force=force)
  File "/usr/local/bin/git-imerge", line 2643, in simplify_to_rebase
    self._simplify_to_path(refname, (i1, 0), path, force=force)
  File "/usr/local/bin/git-imerge", line 2631, in _simplify_to_path
    create_commit_chain(base_sha1, path_sha1),
  File "/usr/local/bin/git-imerge", line 770, in create_commit_chain
    msg=get_log_message(metadata),
  File "/usr/local/bin/git-imerge", line 449, in get_log_message
    'git', 'cat-file', 'commit', commit,
  File "/usr/local/bin/git-imerge", line 107, in check_output
    output = communicate(process)[0]
  File "/usr/local/bin/git-imerge", line 239, in communicate
    output = None if output is None else output.decode(PREFERRED_ENCODING)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 321: ordinal not in range(128)

I assume this is down to dodgy encoding in a commit message? I can't exactly start rewriting 2 years of history in a large, popular FOSS repository, so fixing the data isn't going to happen. I'm unfortunately terribly at Python, so I'm not going to be much use in fixing it, other than suggesting treating commits as binary data rather than text. And maybe printing the commit(s) being worked on when an exception occurs?

mhagger commented 8 years ago

It sounds like you were using imerge correctly. Yes, there must have been a byte \xe2 somewhere in the commit that it was trying to process, probably in either the log message or in the author/committer name. It appears that imerge was trying to encode this use the ascii encoding. Since \xe2 is not a valid ASCII character, :boom:

So one question is the one you posed: "Why doesn't imerge handle data as binary?" It probably should, though it's a pain to do so in Python, especially while retaining compatibility between Python 2 and Python 3. So that might be a nice long-term goal but is unlikely to happen soon.

A simpler question, and one that might solve your current difficulties, is "Why is imerge using the ascii encoding on your system?" This comes from PREFERRED_ENCODING, which comes from the standard Python library function locale.getpreferredencoding(). This function accesses your environment to guess what locale to use.

Since the most common encoding used in Git metadata is UTF-8, I suggest that you try running the last command again as follows:

export LANG=en_US.utf8
git imerge finish

and see if that helps. (If not, see what you have to do on your system to affect the locale.)

Alternatively, you can edit the git-imerge script by hand to set PREFERRED_ENCODING near the top of the script manually, for example like

PREFERRED_ENCODING = 'UTF-8'

If none of that helps, please tell us what repository you are working on and what SHA-1 you are trying to merge so that we can reproduce the problem.