Ignoring trailing newlines at the end of snapshot files

drjasonharrison commented 1 year ago

I saw this issue, https://github.com/joseph-roitman/pytest-snapshot/issues/61 where a linting rule was adding a newline at the end of a file.

I have a similar problem where my editor is configured to make sure that all text files have an ending newline. I also have a pre-commit hook (end-of-file-fixer) which does the same thing, but it is easier to define an exception for that.

Is there a way to override, configure, or redefine the snapshot file loader to strip whitespace at the end of the file?

ThiefMaster commented 8 months ago

IMHO snapshot files should be excluded from any kind of linters/formtters - they are data, not code.

Anyway, I'd like to have an option to include a trailing linebreak in any such file as well, and to ignore (or rather expect) it when comparing test data with a snapshot.

joseph-roitman commented 8 months ago

Here are my ideas on the issue.

Solution 1 - Add a newline yourself

One solution is to add newlines yourself to the value being snapshotted. For example

def ensure_newline(s: str):
    return s if s.endswith('\n') else s + '\n'

snapshot.assert_match(ensure_newline(value_to_snapshot), 'snapshot_file.txt')

The benefit of this solution is that it doesn't require a change to the library and it is very clear to the user what is going on.

Problem 1.1

The first problem with this solution is that it could take the user a while to understand why the test started failing and this issue could be frustrating to debug.

Solution 1.1 - Helpful error messages

The solution to this is to add a help message, for example: AssertionError: The value matches the snapshot, but the snapshot contains a trailing newline. This may be caused by linters and formatters modifying snapshot files. To workaround this issue, ...

Problem 1.2

The second problem is that it is feels a bit messy to add ensure_newline(value_to_snapshot) to every snapshot test. If the value never has a trailing newline, this could be simplified to

snapshot.assert_match(value_to_snapshot + '\n', 'snapshot_file.txt')  # Add a newline to avoid linter issues

though this code would not be clear without a comment describing why the newline is added.

Solution 2 - Always add a newline to snapshot files

This solution would not work since linters could remove double-newlines at ends of files. Another solution could be to add '\nend_of_pytest_snapshot\n' but that is both ugly and a breaking change.

Solution 3 - Silently ignore newline issues

pytest_snapshot could decide to always ignore newline issues (and possibly add a newline to the end of the file by default if there is none). The benefit of this solution is that users won't need to worry about this issue. Also, adding a newline to the end of the file has many benefits (for the same reasons that linters add newlines).

pytest_snapshot already does something similar to this to workaround the git newline issue. It forbids the \r character in strings and all types of newlines in snapshot files are converted to \n when reading the snapshot files.

Problem 3.1

The problem with this solution is that this behavior is somewhat surprising and could hide a change in value_to_snapshot that the user wanted to test for. If this happened, it could frustrate the user and cause them to lose trust in the library.

Solution 4 - Add option to ensure newlines in snapshot files.

There are many ways to do this, each with upsides and downsides.

Solution 4.1

snapshot.assert_match(value_to_snapshot, 'snapshot.txt', add_newline=True)
snapshot.assert_match_dir(dict_to_snapshot, 'snapshot_dir', add_newlines=True)

The benefit of this is that it is clear to the user that this line wont test newline issues. The downside is that it is a bit verbose.

Solution 4.2

pytest --snapshot-add-newline

pytest "--snapshot-add-newline=*.yml,*.json,*.xml"

One major downside is that the user would need to remember to run pytest with the same flag each time. Another downside is that this could be a footgun that could lead to problem 3.1.

Solution 4.3

snapshot.set_config(add_newlines=True)
...
snapshot.assert_match(value_to_snapshot, 'snapshot.txt')

This has quite a few benefits.

This can be called once per test function. Since it is in the same test function, there is less chance of problem 3.1 arising.
Users can decide to put this in a test fixture at the test module or test package level.

Additional thoughts

We need to think of how the chosen solution would interact with other potential features of the library such as user-defined serialization/deserialization. We also need to think about how the solution affects snapshot testing bytes.

In any case, no matter what we decide, it is a good idea to add helpful error messages (solution 1.1) for both str and bytes. That would reduce the issue greatly. If anyone knows the any commonly used tools that automatically add newlines, they could be added to the error message.

joseph-roitman / pytest-snapshot