Unicode escape sequences are not preserved

ikamensh / flynt

A tool to automatically convert old string literal formatting to f-strings

MIT License

689 stars 33 forks source link

Unicode escape sequences are not preserved #196

Open linuxdaemon opened 6 months ago

linuxdaemon commented 6 months ago

This seems related to #55 and #104 but those are both closed as completed and this issue is still present on v1.0.1.

Example

flynt -tj -s "'\u2122'.join(('a', 'b'))"

Results in: "a™b" Instead of: "a\u2122b"

This seems to also occur with octal values:

flynt -tj -s "'\40'.join(('a', 'b'))"

returns: "a b"

ikamensh commented 3 months ago

this is a known and unfortunate limitation. Once python parses your code, which I need to do to get to abstract syntax tree, its no longer possible to determine if a character was an escape sequence or a special character. Now, it might be possible to read file as bytes, and find location of each expression in the file, and therefore see if its a unicode character (I think), so there could be two fixes:

Just detect the usage of escape sequences and raise ConversionRefused, to make flynt skip this expression (easier)
Actually preserve unicode sequences where present.