UnicodeDecodeError in assert_nondirty

lansman commented 8 years ago

Windows 7, Python 2.7.10, bumpversion 0.5.3

[...]: bumpversion patch
<...>
122 files updated, 2 files merged, 64 files removed, 0 files unresolved
(branch merge, don't forget to commit)
Traceback (most recent call last):
  File "C:\Python27\ArcGIS10.2\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\ArcGIS10.2\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\ArcGIS10.2\Scripts\bumpversion.exe\__main__.py", line 9, in <module>
  File "C:\Python27\ArcGIS10.2\lib\site-packages\bumpversion\__init__.py", line 897, in main
    vcs.assert_nondirty()
  File "C:\Python27\ArcGIS10.2\lib\site-packages\bumpversion\__init__.py", line 174, in assert_nondirty
    b"\n".join(lines)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position 77: ordinal not in range(128)

peritus commented 8 years ago

Thanks and good find!

Could you attach the result of running

git status

for this project ? I suspect you have a non-ascii filename that causes this error, but I'd like to know for sure. If that's the case, that should totally be fixed in bumpversion.

lansman commented 8 years ago

peritus, i don't have git, i use mercurial <3

hg status shows no non-ascii filenames at this moment. Although i've changed project's state since i posted this bug report. There is non-ascii file in my project, but i doubt it was in the hg status output when i got the bug.

peritus commented 8 years ago

@lansman Yes, of course. Also hg status having non-ascii characters could trigger this, I suppose.

lansman commented 8 years ago

I've just checked that problem and yes, there are non-ascii files in lines list in bumpversion/__init__.py

There is simple solution for that, just replace 174 line content: from

b"\n".join(lines)))

to

b"\n".join(str(l).encode('string_escape') for l in lines)))

And to avoid unicode problems everywhere, always use: 1. When you want convert Unicode string to bytes string with ASCII only

s_uni = u'\u0430\u0431\u0432' # first three letters of russian alphabet

s_uni.encode('ascii') # UnicodeEncodeError, because their codes a far more than >127

s_uni.encode('unicode_escape').encode('ascii')  # OK, unicode_escape gives us byte string with codes <= 127

2. When you want convert Bytes string (which MAY contain bytes with codes > 127, e.g. those strings are returned by various python built-in OS, network etc functions) to bytes string with ASCII only (<= 127)

s_uni = u'\u0430\u0431\u0432' # first three letters of russian alphabet
# cp1251 - 1-byte ANSI encoding of Russian alphabet
s_1251 =  s_uni.encode('cp1251')  # OK

s_1251.encode('ascii') # UnicodeDecodeError, because s_1251 byte string has bytes with codes > 127

s_1251.encode('string_escape').encode('ascii') # OK, string_escape gives us byte string with codes <= 127

peritus / bumpversion

UnicodeDecodeError in assert_nondirty #114