Keep original line endings when reading expectation files

xymaxim commented 7 months ago

Changes in this PR

Since Python by default opens files in the universal newlines mode (newline=None), line ending characters are translated. Which is not good for us. So, let's preserve original line endings while writing and reading expectation files.

zaufi commented 7 months ago

Could you please elaborate and gimme some more details and/or an example? I just can't understand what problem you're trying to solve %)

xymaxim commented 7 months ago

Could you please elaborate and gimme some more details and/or an example? I just can't understand what problem you're trying to solve %)

While I was working on #24, I noticed that regular tests (not from the previous issue) with \r\n would fail:

    def test_sample_out(capfd, expected_out):
        print('Hello Africa!\r\n', end='')
        stdout, stderr = capfd.readouterr()
>       assert expected_out == stdout
E       AssertionError: assert
E       The test output doesn't equal to the expected
E       (from `/tmp/pytest-of-ms/pytest-109/crlf_test0/crlf_test/test_sample_out.out`):
E       ---[BEGIN actual output]---
E       Hello Africa!↵
E       ---[END actual output]---
E       ---[BEGIN expected output]---
E       Hello Africa!↵
E       ---[END expected output]---

Here’s a quick demo of what’s happening:

# test: 00000000: 410d 0a42                             A..B
#                   \r \n
with open("test", "w") as f:
    f.write("A\r\nB")

with open("test", "r") as f:
    print(repr(f.read()))  # “A\nB”

While this works as desired:

with open("test", "r", newline=””) as f:
    print(repr(f.read()))  # “A\r\nB”

zaufi commented 7 months ago

What makes me worried is that this will ruin (almost) everything! Indeed, for example, I have a trivial output matching test print('Hello Africa!') with the obviously trivial expectation. Whatever OS I use, thanks to Git and the auto CRLF option everything works fine on all platforms! -- cuz during checkout git will replace EOL(s) in the expectations file to the native format and Python's TextIO will handle \n properly...

If I needed smth system-specific, I add a f'-{platform.system()}' suffix to the expectation filename and told Git via .gitattributes to set whatever CRLF style I needed over system-dependent expectations.

Obviously, this patch will make this way broken...

I've taken a closed look at the test added by the #24 ... it looks incorrect to me %) makepatternfile() of the fixture behaves similarly to the pytester.makefile and does not write EOL at the last (and in this case the only) text line! So, strictly speaking, that test never checks for EOL styles...

xymaxim commented 7 months ago

What makes me worried is that this will ruin (almost) everything! Indeed, for example, I have a trivial output matching test print('Hello Africa!') with the obviously trivial expectation. Whatever OS I use, thanks to Git and the auto CRLF option everything works fine on all platforms! -- cuz during checkout git will replace EOL(s) in the expectations file to the native format and Python's TextIO will handle \n properly...

Hmm, indeed, that’s something I hadn’t considered. My initial thoughts were that expectation texts should be treated as immutable, without any changes, and files are just an intermediate state of them, but, yes, Git couldn't be excluded from the story.

xymaxim commented 7 months ago

If I needed smth system-specific, I add a f'-{platform.system()}' suffix to the expectation filename and told Git via .gitattributes to set whatever CRLF style I needed over system-dependent expectations.

I recently came across a binary file with an ASCII text header that needed to be parsed, and the header contains Windows-style line endings. The imaginary test case (just for an example) for some header extract function would be to output the header as is. This is not a platform-specific case.

As I understand, on Unix, it’s not possible to have a pattern with CRLF symbols right now (even if we add tests/data/expected/crlf_test.out -text to .gitattributes) because of how Python converts newlines during reading (CRLF -> LF).

So, the only way is to read it in a binary mode?

xymaxim commented 7 months ago

I've taken a closed look at the test added by the https://github.com/zaufi/pytest-matcher/pull/24 ... it looks incorrect to me %) makepatternfile() of the fixture behaves similarly to the pytester.makefile and does not write EOL at the last (and in this case the only) text line! So, strictly speaking, that test never checks for EOL styles...

Let's discuss it in #24, I'll answer there.

zaufi commented 7 months ago

As I understand, on Unix, it’s not possible to have a pattern with CRLF symbols right now (even if we add tests/data/expected/crlf_test.out -text to .gitattributes) because of how Python converts newlines during reading (CRLF -> LF).

I didn't get it... why not? The other question is if you want to make this test run on all platforms (w/ different native EOLs) and a test's input data file is "static" (in terms of EOL style in its beginning (I'm thinking about some "self-extract" archive w/ a shell script at the beginning %) obviously u need to preserve EOLs in the pattern file added to VCS (git I guess. so .gitattributes could help) and during the test tell to Python to use specific EOL style on read from file...

What could go wrong here? %)

xymaxim commented 7 months ago

The problem is that the test from the PR doesn't pass without the proposed changes because of the missing CR symbol. This may look unexpected and confusing for users, since an expectation file actually contains the symbol, but not the fixture.

zaufi / pytest-matcher

Keep original line endings when reading expectation files #25

Changes in this PR