sifive / wit

Workspace Integration Tool
Apache License 2.0
23 stars 13 forks source link

Add test for unicode branch names. #245

Open richardxia opened 4 years ago

richardxia commented 4 years ago

This adds a test case that demonstrates a bug when the following conditions line up:

  1. You have a Git repo with a branch containing a non-ASCII character
  2. Your shell's locale has a non-UTF-8 encoding
  3. You have a distribution of Python that defaults to a non-UTF-8 encoding if the locale-related environment variables (LANG, LC_ALL, others?) don't specify a locale with a UTF-8 encoding

More concretely, I can deterministically break this on Ubuntu 16.04 using the distribution-provided Python 3. I, however, cannot reproduce on my MacBook because Apple's Python 3 defaults to UTF-8.

You can double check if you have a Python installation that is capable of reproducing the problem by looking at the output of the following command:

$ LANG=C python3 -c 'import locale; print(locale.getpreferredencoding(False))'

On Ubuntu 16.04, it returns ANSI_X3.4-1968 for me. On my MacBook, it returns UTF-8.

The more detailed error message is:

  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/sifive/wit/lib/wit/__main__.py", line 11, in <module>
    main()
  File "/opt/sifive/wit/lib/wit/main.py", line 65, in main
    create(args)
  File "/opt/sifive/wit/lib/wit/main.py", line 126, in create
    update(ws, args)
  File "/opt/sifive/wit/lib/wit/main.py", line 308, in update
    ws.checkout(packages)
  File "/opt/sifive/wit/lib/wit/workspace.py", line 201, in checkout
    package.checkout(self.root)
  File "/opt/sifive/wit/lib/wit/package.py", line 149, in checkout
    self.repo.checkout(self.revision)
  File "/opt/sifive/wit/lib/wit/gitrepo.py", line 248, in checkout
    proc_ref = self._git_command("show-ref")
  File "/opt/sifive/wit/lib/wit/gitrepo.py", line 293, in _git_command
    cwd=cwd)
  File "/usr/lib/python3.5/subprocess.py", line 695, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1072, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1754, in _communicate
    self.stdout.encoding)
  File "/usr/lib/python3.5/subprocess.py", line 976, in _translate_newlines
    data = data.decode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 113387: ordinal not in range(128)

The issue is that we're setting universal_newlines=True in our subprocess.run() calls, which will check for the encoding of the current locale using locale.getpreferredencoding(False). If Git prints out a character that cannot be encoded in ASCII, then Python in an ASCII locale will blow up trying to decode it into a Unicode string.

jackkoenig commented 4 years ago

Nice test branch name 🙂