isaacg1 / pyth

Pyth, an extremely concise language. Try it here:
https://pyth.herokuapp.com/
MIT License
263 stars 57 forks source link

Can’t safely put NUL or CR bytes inside a double-quoted string #186

Open andersk opened 8 years ago

andersk commented 8 years ago

Inside a double-quoted string, Pyth translates CR (\r) to LF (\n). NUL bytes (\000) seem to work unless followed by a digit 0–7, because Pyth translates them to \0 instead of \000.

$ printf '"\r"' | xxd
00000000: 220d 22                                  "."
$ printf '"\r"' | pyth -d /dev/stdin
==================== 3 chars =====================
"
"
==================================================
imp_print("\n")
==================================================

$ printf '"\00012"' | xxd
00000000: 2200 3132 22                             ".12"
$ printf '"\00012"' | pyth -d /dev/stdin 
==================== 5 chars =====================
"12"
==================================================
imp_print("\012")
==================================================
isaacg1 commented 8 years ago

I've fixed the null byte issue, but the CR issue seems to be introduced by Python. I'll need to investigate more for that one.

andersk commented 8 years ago

If you replace open(file_or_string, encoding='iso-8859-1') with open(file_or_string, encoding='iso-8859-1', newline=''), then Python will stop translating \r and \r\n to \n. Of course, you may then need to teach Pyth to keep accepting \r and \r\n in various other places where newlines are significant, to keep Mac and Windows users happy.

(It may be cleaner, but more work, to open in binary mode and use bytes everywhere?)

vendethiel commented 8 years ago

\r hasn't been used on Mac for a while now

andersk commented 8 years ago

There are similar issues with \ followed by NUL or LF or CR.

\␀imp_print("␀")ValueError: source code string cannot contain null bytes

\␊ or \␍IndexError: string index out of range