mitsuhiko / insta

A snapshot testing library for rust
https://insta.rs
Apache License 2.0
2.26k stars 102 forks source link

Binary snapshots (in-memory) #610

Closed lasernoises closed 1 month ago

lasernoises commented 2 months ago

The biggest difference to the previous attempt in #489 is that the snapshot contents are now kept in-memory instead of being written into a temporary file directly.

For SnapshotContents I decided to make it an enum with two variants like in the previous attempt. I first attempted to change it to three variants so it would also replace the SnapshotKind enum but I realized that a lot of logic is still shared between inline and file text snapshots. That approach would definitely also work though and it's perhaps also a question of how much the logic between the two is expected to diverge in the future.

In runtime::assert_snapshot I replaced the combination of the refval and new_snapshot_value with an enum. In that case it is a three variant enum and some From impls for two value tuples for the macros. Maybe it would be better to make it a two variant enum there too for consistency and keep the ReferenceValue enum for the text case. That might be a bit weird because both binary and non-inline text snapshot have a name and that would then be in two different enums.

I should probably also add some tests in cargo-insta. I'll do that as soon as possible, but I need to figure out how the testing there works.

max-sixty commented 2 months ago

For SnapshotContents I decided to make it an enum with two variants like in the previous attempt. I first attempted to change it to three variants so it would also replace the SnapshotKind enum but I realized that a lot of logic is still shared between inline and file text snapshots. That approach would definitely also work though and it's perhaps also a question of how much the logic between the two is expected to diverge in the future.

FWIW I think the ontology is quite an interesting case — FileText & FileBinary share some properties, but FileText & InlineText also share some completely different properties. So there's no perfect hierarchy.

The proposed structure seems reasonable (as above I'd like to see if we can reduce the number of data structures a bit, but I don't think there are any easy large clean-ups)

max-sixty commented 2 months ago

To sketch out what I think we need to do to merge:

max-sixty commented 2 months ago

Re additional tests: we should ensure we cover named snapshots. Possibly I'm making a mistake but this doesn't compile atm:

#[test]
fn test_binary_snapshot_named() {
    insta::assert_binary_snapshot!("txt", "name", b"test".to_vec());
}

I'm thinking of whether we can retain more of the structure of existing familiar macros, rather than using different parameters in the form of $extension, $name, $value. (developers of the library can undervalue the benefits of consistency because we're so "in" it, while most folks reading tests would highly value consistency; particularly for macros, which don't easily expose the names of their parameters)

Not confident, but one possible way to retain $name, $value is to push $extension into the $name param:

I think this would be reasonably simple? On the downside, name.tar.gz looks like a file name, and we don't exactly get that file name; and it's not possible to name a snapshot beginning with *.


Edit: another downside with the currently proposed code — it's not possible to have a debug_expr without a name, which is different from the other macros.

lasernoises commented 2 months ago

You're right, the named binary snapshots were broken. I've now fixed them and also changed how the macros work in in the text case. The change is not necessarily an obvious win, because it introduces a little bit of weirdness with an ignored parameter, but I do think I makes it a bit clearer what is going on and it requires a little bit less From magic.

Syntax wise we have a lot of options for how to pass the extensions and I'm not quite sure what the best one is. We could possibly even use ident tokens or something like that, but I suppose that's not optimal either because it would cause weirdness around keywords.

max-sixty commented 2 months ago

Yes I think on reflection I do quite like the idea of using "*.jpg" to mean "unnamed" — it means binary snapshots follow quite closely to the original snapshots. (but you should decide what you think is best + @mitsuhiko should review! I will ping him a DM when the PR is fully ready to ensure the review doesn't linger.)

lasernoises commented 2 months ago

Public API wise: I'm assuming the _macro_supportmodule is not considered part of the public API?

What I'm a bit more worried about is the internals module. I am changing some things that are exposed in there. The docstring does imply that it's primarily for documentation. So that might imply that it's not part of the stable API?


  • We should decide whether we mark as experimental. If we don't mark it as experimental, we're stuck with the API, and in some ways insta has more stringent backward-compatibility constraints than other libraries, since it needs to work with older versions of itself in the form of cargo-insta)

I think older versions of cargo-insta will definitely not be able to support binary snapshots. I think that's simply not solvable.

What should still work is working with text snapshots produced by the new version of insta in an old version cargo-insta and also new versions of cargo-insta working with old versions of insta. What would happen in the latter case is that cargo-insta would insert snapshot_type: text into the metadata, which should not be a huge issue since the older version of insta will just ignore that. But we could also just not insert that field in the text case in the new version in general.

I think the only one of those we can reasonably test in an integaration test is working with an a snapshot that looks like the old format (meaning just missing the snapshot_type: text.

I assume marking it as experimental would mean marking it as such in the doc comment of the assert_binary_snapshot macro?


Yes I think on reflection I do quite like the idea of using *.jpg to mean "unnamed" — it means binary snapshots follow quite closely to the original snapshots. (but you should decide what you think is best + @mitsuhiko should review! I will ping him a DM when the PR is fully ready to ensure the review doesn't linger.)

If we do the name thing like this:

assert_binary_snapshot!("*.jpg", b"test".to_vec());
// or like this for the named case:
assert_binary_snapshot!("some_name.jpg", b"test".to_vec());

that would mean we'd need to parse that string at runtime because I don't think the contents of a string literal can be parsed by a macro_rules macro.

I do sort of like the idea of using ident tokens to make it look something like this:

// the . token at the beginning would just be to make it more obvious that it's an extension
assert_binary_snapshot!(.jpg, "name", b"test".to_vec());

This could even be extended to the name like this:

assert_binary_snapshot!(name.jpg, b"test".to_vec());

But I don't think that's a good idea because that would make the name quite inconsistent with the other macros. Also you couldn't use keywords like match as a name then (or maybe if we made the macro rather complex).

I think with just the extension the keyword issue would be unlikely as I can't think of a file extension that is also a rust keyword.

Other options:

assert_binary_snapshot!("name"."jpg", b"test".to_vec());
assert_binary_snapshot!(*."jpg", b"test".to_vec());

assert_binary_snapshot!("name".jpg, b"test".to_vec());

I'm not super happy about any of those approaches tbh, so I think I'll leave it as it is for now and then we can see what @mitsuhiko's opinion is once he has time to review.

max-sixty commented 2 months ago

What I'm a bit more worried about is the internals module. I am changing some things that are exposed in there. The docstring does imply that it's primarily for documentation. So that might imply that it's not part of the stable API?

Yes, this is hidden and so not part of the public API (somewhat by convention). We only have it public so cargo-insta can access it.

I think older versions of cargo-insta will definitely not be able to support binary snapshots. I think that's simply not solvable.

For sure — my point re older cargo-insta versions is: if we release this in 1.41 and then realize we should have done it differently, we can't break the 1.41 format, unless we've marked it as experimental. And we're a bit more constrained than other apps because things like pending snapshots are an interface between cargo-insta & insta, which themselves can be on different versions...

that would mean we'd need to parse that string at runtime because I don't think the contents of a string literal can be parsed by a macro_rules macro.

Yes, but I think that's fine? The existing approach for text snapshots — Autoname — is at runtime too, albeit the macro knows at compile time which path it'll take.

I would be less keen on the ident approach — I don't see much benefit and it's less consistent.

max-sixty commented 2 months ago

@mitsuhiko I'll DM you in the next few days about this PR. The things we want from you:

mitsuhiko commented 2 months ago

Sorry for ghosting this for so long. These PRs always require actual brainpower to be put to it, so they tend to slip down the list automatically when I have limited capacity for it. So first of all: definitely on board with having this.

I will give this some more deeper thoughts but here are some very immediate questions that came up. The most obvious one is this one:

assert_binary_snapshot!("some_name.jpg", b"test".to_vec());

This was rejected above there because the string requires parsing at runtime. Is this a problem? Because at least in theory this makes the most sense from a user experience point of view and all the other stuff proposed seems a bit odd.

lasernoises commented 2 months ago
assert_binary_snapshot!("some_name.jpg", b"test".to_vec());

This was rejected above there because the string requires parsing at runtime. Is this a problem? Because at least in theory this makes the most sense from a user experience point of view and all the other stuff proposed seems a bit odd.

I just would have preferred to make it a compile time error if the . was missing. We could also interpret it just as an extension in that case, but I don't think that's a good idea.

I totally agree that all of the other options I layed out are a bit weird and not great. Also given that this is a testing library I think checking things at runtime is much less of an issue than for other libraries so I suppose the thing about not being able to check the format of the string at compile time is more of an annoyance than a real issue.

I'll give this a go tomorrow. I think for unnamed snapshots I'll just use ".jpg" instead of "*.jpg" because I think that makes it look less like a glob, which I think could make it a bit confusing when reading one of those for the first time. But that might just be me.

max-sixty commented 2 months ago

. I think for unnamed snapshots I'll just use ".jpg" instead of "*.jpg" because I think that makes it look less like a glob, which I think could make it a bit confusing when reading one of those for the first time.

I'd be OK with this. The downside is it prevents using a leading . as a file name (though technically *.jpy prevents using a leading * as a file name!).

And the *.jpy looks sufficiently different from a name. But also understand it looks like a glob. Open to other ideas; could use a $ or _ or {name} etc.

lasernoises commented 2 months ago

Ok, I've implemented this. The way I did it is with a new struct and a From because that ended up seeming like the simplest way without having the parsing code in the macro. I thought I'd parse the string in assert_snapshot at first, but there we match in two different places for name and extension. But I can also do it in a different way if we don't want an extra type.

I'd be OK with this. The downside is it prevents using a leading . as a file name (though technically *.jpy prevents using a leading * as a file name!).

The way it works now an empty string is not allowed as a name since that is interpreted as wanting an auto name. Which I think is reasonable.

. is indeed also not possible as the name because it would be interpreted as the start of the extension. This would possibly also apply with the *.ext syntax because we'd probably then parse that as the start of the extension with empty-string as the name. Unless we did some special casing where the first character is always considered part of the name.

max-sixty commented 2 months ago

Great, thanks @lasernoises , I think the functionality is good!

From the list above: https://github.com/mitsuhiko/insta/pull/610#issuecomment-2369071150, we should add some docs here, I would suggest marking it as experimental. And if possible we could add a very short section on the website, and then I think we're good to go. (a couple of code clean-ups too)

(I'll let @mitsuhiko weigh-in if he has other feedback, but won't wait for another review unless he requests one...)

lasernoises commented 1 month ago

From the list above: #610 (comment), we should add some docs here, I would suggest marking it as experimental. And if possible we could add a very short section on the website, and then I think we're good to go. (a couple of code clean-ups too)

I hope I'll have some time tomorrow to add something to the website.

Regarding the experimental thing: Should I just add a doc comment to the macro that states that the feature is experimental? Also: if we consider the feature experimental, maybe it would make sense to not write the snapshot_kind metadata field for text snapshots yet?


On an unrelated note: I was trying to test if the feature for deleting unreferenced snapshots works correctly for binary snapshots, but I can't seem to get that to remove unreferenced snapshots at all, even on master. It might be an issue with my machine or something like that.

From the code I have a suspicion that once I get it to work it will not handle binary snapshots right yet. So it probably makes sense to wait with merging this until I've figured out what I'm doing wrong there.

max-sixty commented 1 month ago

Regarding the experimental thing: Should I just add a doc comment to the macro that states that the feature is experimental?

Yes great! And the changelog / basically anywhere we mention it. We can say something in the macro docstring like "we may make incompatible changes for the next couple of versions after 1.41"

maybe it would make sense to not write the snapshot_kind metadata field for text snapshots yet?

I think it's fine to add. I think maybe 20% chance we change something about the interface somewhere, but that piece seems quite robust.

I was trying to test if the feature for deleting unreferenced snapshots works correctly for binary snapshots, but I can't seem to get that to remove unreferenced snapshots at all, even on master

Whoops, this no longer works — not your machine! Will fix and add a test (test coverage gradually growing from a very low base! It did work a while ago, I guess before my recent changes)

Edit: now fixed with https://github.com/mitsuhiko/insta/pull/634

max-sixty commented 1 month ago

Let's add a couple of examples to the docs and then hit the big button. Thank you very much for all your persistence here, I know it was a long road. It's a great feature for insta!

lasernoises commented 1 month ago

The deleting of unreferenced snapshots does not work for binary snapshots yet. I'm working on fixing that now.

lasernoises commented 1 month ago

Now it works, but I'm not quite sure what the best error handling strategy there is. It needs to load the snapshot now to know whether it's binary and to get the extension. It still ignores errors when deleting the files like before. So I think it should probably also ignore errors when loading the snapshot and just skip it and maybe print a log message.

lasernoises commented 1 month ago

What I'm a bit more worried about is the internals module. I am changing some things that are exposed in there. The docstring does imply that it's primarily for documentation. So that might imply that it's not part of the stable API?

Yes, this is hidden and so not part of the public API (somewhat by convention). We only have it public so cargo-insta can access it.

I meant to reply to this earlier but forgot: I think you meant the _cargo_insta_support module. Because the internals module is actually not hidden. I think given that it's called internals we can probably justify changing things exposed in it. Maybe it would make sense to add a comment there that explicitly states that this is not guaranteed to be stable.

lasernoises commented 1 month ago

I've started adding a section for the website: https://github.com/mitsuhiko/insta-website/compare/main...lasernoises:insta-website:binary-snapshots?expand=1. I think I'll wait with opening a PR until this is merged.

max-sixty commented 1 month ago

I think you meant the _cargo_insta_support module. Because the internals module is actually not hidden. I think given that it's called internals we can probably justify changing things exposed in it. Maybe it would make sense to add a comment there that explicitly states that this is not guaranteed to be stable.

Yes v good point. Would be great to add the comment (but no need to slow down this PR ofc)

max-sixty commented 1 month ago

@lasernoises anything else? Or we merge? :)

lasernoises commented 1 month ago

I think from my point of view we can merge.

lasernoises commented 1 month ago

🎉🎉🎉

Thank you so much for your patience and the many reviews/comments!

max-sixty commented 1 month ago

Thank you!!