Should we use snapshots in tests

MCJOHN974 commented 9 months ago

Moved from one PR review, @akashin said:

I think we should re-consider this design decision and explore what our other options are: [Current] Generate WAT files from WAST files and run them through WAT -> ZKASM pipeline Generate ZKASM files directly from WAST files and don't commit WAT files to Git Don't commit ZKASM files to Git and instead pass generated ZKASM code directly to JS test runner We need to understand what are the pros and cons of each approach and see which one is the most appropriate for our goals.

I suggest discuss in this issue if we should migrate from snapshot pattern.

My first opinion:

In general, difference between using snapshots or not is if generated code is visible explicitly or not
I think it can be suitable to have some generated zkasm files, for example to play with some new zk runner
With snapshots we can easily see in PR review how generated code looks like, how it is changed
Storing generated files take some place on disk.
Currently zkasm data takes 0.85% of disk usage (6.8Mb) of wasmtime repo on my machine without counting target directory.

Generally, I see here tradeoff between commit some data explicitly or generate-check-drop in CI. Even if we will have 10 times more tests, du of zkasm data will be 70Mb, which is I think acceptable, so I think there should be some other reasons to hide some data.

MCJOHN974 commented 9 months ago

One new cons of snapshot testing from PR discussion (thanks @ akashin and @ mooori):

if generated code is unstable it can be difficult to organize snapshot testing.

aborg-dev commented 9 months ago

Thank you for starting the discussion. I agree with the points that you mentioned so far.

I think the main challenge with generated files is the amount of noise they generate in the pull requests. Take for example a recent stack handling change. It touches over 12k lines of code and 734 files. Out of those, only 5 files are essential changes to Rust with less than 100 lines changed. The rest are generated files. And as we add more spec tests, we will only have more of those. The concrete downsides that I see are:

GitHub UI is not well suited for such big PRs and is sluggish to use
Reviewing those changes is also challenging, I don't really expect the reviewer to look over all 700 files - at most, they will look at a few of those files to see how generated code has changed. But looking at a specific file is tricky due to the previous point
Rebasing/merging PRs with generated files is tricky because they almost always result in merge conflicts
Sometimes, changes in generated files don't really matter as long as the underlying tests pass

Now, I'll admit that I've proposed this approach with snapshots in the first place and it clearly doesn't scale well with the increasing number of WAT/ZKASM files in the codebase. So it is time to revise it and see if we can come up with a better solution.

MCJOHN974 commented 9 months ago

GitHub UI is not well suited for such big PRs and is sluggish to use

Hm, I'm using just google chrome without plugins and have button "hide all files in directory", which help focus on human-written files during the review, as long as we keep generated files in one directory.

Reviewing those changes is also challenging, I don't really expect the reviewer to look over all 700 files - at most, they will look at a few of those files to see how generated code has changed. But looking at a specific file is tricky due to the previous point

Agree. But sometimemes it's nice to see 1-2-3 generated files and I don't think pick 3 files and commit them is a good option here.

Rebasing/merging PRs with generated files is tricky because they almost always result in merge conflicts

Btw, we have such problem in very high level -- almost all PR change total state.csv, and I often face merge conflict on this file. Maybe better find some more generic solution here, like githooks?

aborg-dev commented 9 months ago

Agree. But sometimemes it's nice to see 1-2-3 generated files and I don't think pick 3 files and commit them is a good option here. Btw, we have such problem in very high level -- almost all PR change total state.csv, and I often face merge conflict on this file. Maybe better find some more generic solution here, like githooks?

Could we use instead use something like workflow artifacts to store both generated ZKASM files and state.csv file and present a link to them during the PR review? This way the reviewer will have the opportunity to look at the latest generated files and we won't have other problems mentioned above.

nagisa commented 8 months ago

Could we use instead use something like workflow artifacts to store both generated ZKASM files and state.csv file and present a link to them during the PR review?

From experience, nobody will bother look at the changes in the artifacts. The generated files themselves aren’t really interesting by themselves – the changes to them are.

The current snapshots serve a primary purpose of being something skimmable to enable the ability to spot-check the changes in the generated code. I don’t think they are meant to be carefully looked through exhaustively every time. However moving the snapshots out of the diffs hurts both the skimmability and spotcheckability aspects of the setup.

That said, I agree that having 700 changed snapshots isn’t particularly tenable, but that's just the natural outcome of us choosing an approach that lets us move fast with a minimal amount of effort.

Perhaps part of the problem with it is that we’re making snapshots not of the entire file but rather one for each single function in the wasm spectest suite. In practice a function that adds up 1 + 2 is mot meaningfully different codegen wise from a function that adds up 1 + 0xFFFF_FFFF in most cases and it probably does not warrant having a snapshot for each of them at all.

cranelift filetest suite avoids this in part by:

Having runtests be a separate thing from snapshotted codegen tests;
Having all the snapshots show up in the test file itself, rather than as separate files;
Not striving to make each snapshot a codegen test an individually runnable full-blown with assertions and such.

Which brings me to a question: why is it that instead of augmenting filecheck to serve our needs (now that we're actually spending proper amount of time to set things up the right way,) we’re thinking of ways to improve the infra we handrolled as a shortcut? It sounds to me that the only valid end state is filecheck and any additional time we spend on the shortcut only makes sense if it actually gives a huge amount of value with very little investment in engineering resource.

nagisa commented 8 months ago

Just so that it doesn’t get lost: some discussion of what it would take to correctly integrate zkasm runtests into filecheck: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/filetest.3A.20runtests.20that.20invoke.20external.20commands/near/402481325

MCJOHN974 commented 7 months ago

Resolved by https://github.com/near/wasmtime/pull/215

near / wasmtime

Should we use snapshots in tests #188