Open jnareb opened 11 months ago
Will prepare a release in the upcoming days :+1:
Unfortunately, commit 2771a878f7bc6619e625feb4dbad3427f57f5237 does not fully solve the problem of c-style quoted filenames.
It makes unidiff to be able to parse patch with quoted filenames, but it then reproduces those filenames in their original quoted format. Shouldn't unidiff decode such filename to str
if possible, to bytes
if not (e.g. invalid UTF-8)?
All the code does it makes unidiff be able to remove "a/" or "b/" prefix from filenames even if they are in their c-quoted form.
Here is a bit ugly code that actually tries to decode c-quoted filename; not tested for Python 2
def decode_c_quoted_str(text):
"""C-style name unquoting
See unquote_c_style() function in 'quote.c' file in git/git source code
https://github.com/git/git/blob/master/quote.c#L401
This is subset of escape sequences supported by C and C++
https://learn.microsoft.com/en-us/cpp/c-language/escape-sequences
:param str text: string which may be c-quoted
:return: decoded string
:rtype: str
"""
# TODO?: Make it a global variable
escape_dict = {
'a': '\a', # Bell (alert)
'b': '\b', # Backspace
'f': '\f', # Form feed
'n': '\n', # New line
'r': '\r', # Carriage return
't': '\t', # Horizontal tab
'v': '\v', # Vertical tab
}
quoted = text.startswith('"') and text.endswith('"')
if quoted:
text = text[1:-1] # remove quotes
buf = bytearray()
escaped = False # TODO?: switch to state = 'NORMAL', 'ESCAPE', 'ESCAPE_OCTAL'
oct_str = ''
for ch in text:
if not escaped:
if ch != '\\':
buf.append(ord(ch))
else:
escaped = True
oct_str = ''
else:
if ch in ('"', '\\'):
buf.append(ord(ch))
escaped = False
elif ch in escape_dict:
buf.append(ord(escape_dict[ch]))
escaped = False
elif '0' <= ch <= '7': # octal values with first digit over 4 overflow
oct_str += ch
if len(oct_str) == 3:
byte = int(oct_str, base=8) # byte in octal notation
if byte > 256:
raise ValueError(f'Invalid octal escape sequence \\{oct_str} in "{text}"')
buf.append(byte)
escaped = False
oct_str = ''
else:
raise ValueError(f'Unexpected character \'{ch}\' in escape sequence when parsing "{text}"')
if escaped:
raise ValueError(f'Unfinished escape sequence when parsing "{text}"')
text = buf.decode()
return text
I was wondering why unidiff fails on changes to files with filenames that include characters outside 7-bit ASCII, and it turns out that the latest release v0.7.5 does not include commit 2771a87 (Support quoted filenames, 2023-06-02).
Could we please get a new release with this fix included?
Thanks in advance.