python / cpython

The Python programming language
https://www.python.org
Other
63.12k stars 30.22k forks source link

untokenize of specially crafted escaped characters does not round trip properly #125821

Open asottile opened 4 hours ago

asottile commented 4 hours ago

Bug report

Bug description:

this small program does not roundtrip through tokenize / untokenize -- it appears to mishandle the escaped quote as a \N{NAMED ESCAPE}

bar = 1
print(f"{bar} \"{{SNOWMAN}} {{foo}}")

this is what it produces after a round of untokenization:

$ python3 t.py t3.py 
bar = 1
print(f"{bar} \"{ SNOWMAN}} {{foo}}")

annoyingly, tokenize_rt suffers from a different related bug which is why I was investigating this to begin with. an aside, the handling of curly braces in 3.12+ tokenization is a huge pain!

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

asottile commented 4 hours ago

cc @pablogsal ecf16ee50e42f979624e55fa343a8522942db2e7