nix-community / rnix-parser

A Nix parser written in Rust [maintainer=@oberblastmeister]
MIT License
351 stars 42 forks source link

Broken paths are silently skipped #146

Open tazjin opened 1 year ago

tazjin commented 1 year ago

Certain path literals are parse errors, most prominently paths with a trailing slash. Compare C++ Nix and Tvix (with rnix-parser):

# C++ Nix
nix-repl> ../../
error: path '../../' has a trailing slash

# rnix-parser (through tvix)
tvix-repl> ../../
error[E015]: failed to parse Nix code:
 --> [tvix-repl]:1:1
  |
1 | ../../
  | ^^^^^^ code ended unexpectedly while the parser still expected more

These are correctly turned into errors in both cases, however, rnix-parser seems to do something strange if the broken path occurs somewhere in the middle of an AST:

# rnix-parser, via tvix, with `--display-ast`
tvix-repl> import ../../
Ident(Ident(NODE_IDENT@0..6))
=> <<primop>> :: lambda

tvix-repl> [ 1 2 ../../ 3 ]
List(List(NODE_LIST@0..16))
=> [ 1 2 3 ] :: list

Compare with C++ Nix:

nix-repl> import ../../
error: path '../../' has a trailing slash

nix-repl> [ 1 2 ../../ 3 ] 
error: path '../../' has a trailing slash

I haven't manually investigated the actual structure produced by rnix-parser in this case yet; it must be in there somewhere (side note: is there a built-in way for some more pretty-printed representation of the AST, either in the crate (that I've missed) or that someone wrote?)

oberblastmeister commented 1 year ago

You can use {:#?} to get a more detailed pretty-printed representation

oberblastmeister commented 1 year ago

The ast is

Root(
    NODE_ROOT@0..14
      NODE_IDENT@0..6
        TOKEN_IDENT@0..6 "import"
      TOKEN_WHITESPACE@6..7 " "
      TOKEN_ERROR@7..13 "../../"
      TOKEN_WHITESPACE@13..14 "\n"
    ,
)

This is probably due to https://github.com/nix-community/rnix-parser/blob/f8056c4811b6f5196af64db14b89651e892ad98a/src/kinds.rs#L133 We just bump error tokens as trivia so they appear in the tree, but don't report them as errors.

tazjin commented 1 year ago

We just bump error tokens as trivia so they appear in the tree, but don't report them as errors.

Hmm, not sure I understand the implication of this. We'd have to do a pass over the entire tree and detect any error nodes to get "all" errors? Are there any things other than broken paths that exhibit this behaviour?

Right now in the Tvix compiler, we have an expectation that the root expression is "sane" when passing it over as long as rnix didn't report any errors.

oberblastmeister commented 1 year ago

Yeah we should probably be reporting these errors.