mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows

ni-balexand commented 1 year ago

The issue here can be demonstrated by running the following script on Windows:

import mako.cmd
import sys

with open("template_file", "w") as f:
    print("Line 1", file=f)
    print("Line 2", file=f)
    print("Line 3", file=f)

sys.argv.append("--output-file")
sys.argv.append("rendered_file")
sys.argv.append("template_file")

assert sys.argv[1:] == ["--output-file", "rendered_file", "template_file"]

# pass sys.argv off to mako
mako.cmd.cmdline()

# awkward \r\r\n EOL sequences is what is written to disk
with open("rendered_file", "rb") as f:
    rendered = f.read()
assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

# If you were to open/read in text mode it shows up as two newlines
with open("rendered_file", "rt") as f:
    rendered = f.read()
assert rendered == 'Line 1\n\nLine 2\n\nLine 3\n\n'

It's odd that read_file() in util.py opens / reads in binary mode but then cmdline() in cmd.py opens / writes in text mode. The whole point of using text mode should be to convert EOLs on Windows, but it does not do this correctly.

zzzeek commented 1 year ago

hi -

why would you not simply fix the file to have proper CR/NL combinations? (edit: OK I thought that was the input file, it's the output)

this part:

 assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

looks wrong?

I'd really like to use Python's text mode here and not go back to a Python 2-ish approach.

zzzeek commented 1 year ago

I guess that written file is the bug. I'm not really understanding the problem yet. I dont want to write "b" at all.

sqla-tester commented 1 year ago

Mike Bayer has proposed a fix for this issue in the main branch:

switch to textmode in all cases https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/4450

zzzeek commented 1 year ago

see above gerrit. does not work yet, as we have some super legacy BOM stuff I'd have to figure out and/or just remove.

the "reading bytes" thing is strictly python 2. if it's causing problems, it's out. I want to use the most modern Python features

zzzeek commented 1 year ago

OK I understand the issue now. here is the fix:

diff --git a/mako/cmd.py b/mako/cmd.py
index 93a6733..deb8b95 100755
--- a/mako/cmd.py
+++ b/mako/cmd.py
@@ -86,12 +86,12 @@ def cmdline(argv=None):

     kw = dict(varsplit(var) for var in options.var)
     try:
-        rendered = template.render(**kw)
+        rendered = template.render_unicode(**kw)
     except:
         _exit()
     else:
         if output_file:
-            open(output_file, "wt", encoding=output_encoding).write(rendered)
+            open(output_file, "wt", newline="", encoding=output_encoding).write(rendered)
         else:
             sys.stdout.write(rendered)

we have no need for text mode to convert newlines so we turn it off.

can you confirm this fixes all issues on your end? thanks

ni-balexand commented 1 year ago

Yes, that fixes the issue.

I get the confusion now. Opening as text vs binary has several effects including:

whether read/write operates with str or bytes types
any necessary encoding / decoding
newline conversion

Opening in text mode with newline="" will turn off newline conversion while leaving the character encoding alone, and continuing to use the str type for read/write operations. It looks like render_unicode will return a str type just as render did.

ni-balexand commented 1 year ago

I don't see any specific reason why we would need to change render to render_unicode as far as the newline problem I'm looking at (reverting back to just render still looks fine).

Do you think changing to render_unicode is necessary for other string encoding reasons?

zzzeek commented 1 year ago

render() returns either bytes or string while render_unicode always returns string.

zzzeek commented 1 year ago

right it's not strictly related but the way it is is technically wrong. eventually typing will be added here and it will change anyway

sqlalchemy / mako

mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows #376