nix-community / rnix-parser

A Nix parser written in Rust [maintainer=@oberblastmeister]
MIT License
351 stars 42 forks source link

Verifying parser correctness #107

Open darichey opened 2 years ago

darichey commented 2 years ago

When making changes to the parser, I am always scared of regressions and properly handling edge cases in the Nix grammar. We want to ensure that rnix-parser parses Nix expressions correctly, but since Nix has no formal specification, the best we can ask in this regard is "does it do the same thing as the reference implementation?" I have built a tool to help us in answering that question: https://github.com/darichey/rnix-parser-tester.

There's a lengthy write-up if you're interested in the details, but the TLDR is you can feed it a Nix expression and it will tell you if rnix-parser and the reference implementation produced the same AST for that expression.

I have used it to verify the correctness of #96, #99, #101, #102, #103, and #106 by parsing all of nixpkgs. The tool provides a way to summarize the comparison results:

$ cargo run -- summary summary/f759fce.json summary/with-patches.json
...
== Summary ==
# equal before: 15924
# not equal before: 8135
# reference impl errors before: 1
# rnix-parser errors before: 1

# equal after: 24060
# not equal after: 0
# reference impl errors after: 1
# rnix-parser errors after: 0

# progressions: 8135
# regressions: 0

This tells us that before applying the patches, 8135 files in nixpkgs were parsed differently by rnix-parser. After applying the patches (and another one I'm still working on), all valid Nix files are parsed equivalently. (There are several caveats to these results which I cover here)

I wanted to make other contributors aware of the tool in case they would find it useful, and also ask if there would be any interest in somehow incorporating it into rnix-parser's testing (probably through CI). Thanks!

oberblastmeister commented 2 years ago

I wonder if the new typed api would allow you to get rid of your custom ast type. The only difference would be that you would need lots of unwraps, but you already do those unwraps anyway when constructing the custom ast

Ma27 commented 2 years ago

Very nice, thank you! :)

oberblastmeister commented 2 years ago

By the way @darichey please rebase those prs so I can merge them!

ncfavier commented 2 years ago

Maybe a silly question but: why don't we just expose the Nix parser as a library and use that in rnix-parser?

darichey commented 2 years ago

Do you mean as opposed to re-implementing the parser as has been done in rnix-parser? The primary reason (I would assume) is the desire for a fault-tolerant parser. It's very difficult to build nice tooling, especially editor tooling like rnix-lsp, around a parser which bails at any sign of trouble (which the reference Nix parser does, while rnix-parser does not)

darichey commented 2 years ago

I wonder if the new typed api would allow you to get rid of your custom ast type. The only difference would be that you would need lots of unwraps, but you already do those unwraps anyway when constructing the custom ast

Yeah, the new typed api is much nicer to use, but really has the same main usability problem for this use case: I'm only interested in grammatically-valid Nix, so all the Options throughout the tree are just noise. I thought about trying to augment the nodes macros to also generate an error-less AST, but I don't know if it would work or if that use case is common enough to warrant any changes here (I doubt it).

fogti commented 2 years ago

It's a bit unfortunate that rust doesn't allow GATs yet, which could make the "maybe we want Options" ergonomic, and without code duplication (macro-based code duplication still incurs a compile-time penalty)