Unicode escapes, and especially non breaking spaces

ikamensh / flynt

A tool to automatically convert old string literal formatting to f-strings

MIT License

688 stars 33 forks source link

Unicode escapes, and especially non breaking spaces #55

Closed Nodd closed 1 year ago

Nodd commented 4 years ago

flynt reformats uncode escapes such as non breaking spaces :

"a\xA0%s" % a

becomes :

f"a {a}"

The space between a and {a} is still an non breaking space, but the difference is not visible to the human anymore. Is it possible to keep unicode escapes as-is in files ?

Note that this is the case for all unicode escapes (I tried with \xA0 which becomes °), but it's especially annoyoing with non breakable spaces. Also it's the developper's choice to use unicode escapes in code, there could be various reasons and flynt should not touch them.

Thanks !

ikamensh commented 4 years ago

Hey Nodd, thanks for the bug report.

here is an example showing why this is difficult:

import ast

s_in = """x = 'a\xA0%s' % a"""

s = ast.parse(s_in).body[0].value.left.s
print(f"{s}")
print(bytes(s, encoding='utf-8').decode('latin-1') )

Output:

a %s
aÂ %s

I am not very sure how to recover the original text from the AST. Flynt doesn't work as a direct string replacement, and instead converts parts of code into AST, manipulates it and restores it back.

I will keep the issue open but no ETA on when I can solve this.

ikamensh commented 4 years ago

Also skipping strings with escaped unicode appears not trivial: https://stackoverflow.com/questions/45651565/how-to-determine-if-a-string-is-escaped-unicode

Nodd commented 4 years ago

I guess I'll have to replace the characters afterwards, thanks for looking !