softdevteam / lrpar

Rust LR parser
Other
1 stars 0 forks source link

Tokens defined in lexer but not in grammar should refuse to generate a parse tree. #36

Closed jacob-hughes closed 6 years ago

jacob-hughes commented 6 years ago

If tokens are defined in the lexer, but not the grammar, lrpar displays a warning to the user but still attempts to parse the input.

Using lrpar to generate a parse tree from a source program which can be lexed to tokens undefined in the grammar, will lead either to non-termination, or cause lrpar to panic with varying backtraces.*

This can be especially problematic when using lrpar as a library. Calling parser::parse will attempt to parse the input without displaying the warning. This leaves the user with the possibility of a not immediately obvious panic or non-termination without necessarily knowing why.

Hence I can see two possible solutions to this, from a user experience perspective:

  1. Error exit when warning the user that the lexer contains tokens which are not defined in the grammar, and don't even attempt to parse the source program input. This behaviour would be similar(ish) to flex - bison, where tokens which are defined only in the lexer fail to compile.

  2. Panic when a lexeme which is not defined in the grammar is detected in the user source input.

Example usage:

lexer.l

hang HANG
the THE
dj DJ
[ \t\n\r]+ ;
\+   PLUS

parser.y

%start lines
%%
lines : lines line | line;
line : "HANG" "THE" "DJ";

input hang the dj + hang the dj +


* Interestingly, the way in which lrpar fails in this scenario appears to be non-deterministic, running the program with the same input several times yields a different kind of failure. Is this to do with how lrpar determines how to traverse the grammar again after encountering an error?

ltratt commented 6 years ago

What output / error do you see with this example?

jacob-hughes commented 6 years ago

Below are two example error messages when executing lrpar with the above inputs:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335:20
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at /checkout/src/libstd/sys_common/backtrace.rs:68
             at /checkout/src/libstd/sys_common/backtrace.rs:57
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/panicking.rs:381
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:397
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:577
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:538
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:522
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:498
   8: core::panicking::panic_fmt                                                                                                                                                                                                                           
             at /checkout/src/libcore/panicking.rs:71                                                                                                                                                                                                      
   9: core::panicking::panic
             at /checkout/src/libcore/panicking.rs:51
  10: <core::option::Option<T>>::unwrap
             at /checkout/src/libcore/macros.rs:20
  11: <lrpar::parser::Parser<'a, TokId>>::lr_cactus
             at ./src/lib/parser.rs:258
  12: lrpar::kimyi::r3s_n
             at ./src/lib/kimyi.rs:327
  13: lrpar::kimyi_plus::recover::{{closure}}
             at ./src/lib/kimyi_plus.rs:98
  14: pathfinding::astar::astar_bag
             at /home/jake/.cargo/registry/src/github.com-1ecc6299db9ec823/pathfinding-0.2.4/src/astar.rs:226
  15: lrpar::kimyi_plus::recover
             at ./src/lib/kimyi_plus.rs:77
  16: <lrpar::parser::Parser<'a, TokId>>::lr
             at ./src/lib/parser.rs:165
  17: <lrpar::parser::Parser<'a, TokId>>::parse
             at ./src/lib/parser.rs:111
  18: lrpar::parser::parse_rcvry
             at ./src/lib/parser.rs:313
  19: lrpar::main                                                                                                                                                                                                                                          
             at src/main.rs:172                                                                                                                                                                                                                            
  20: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:101
  21: std::rt::lang_start
             at /checkout/src/libstd/panicking.rs:459
             at /checkout/src/libstd/rt.rs:58
  22: main
  23: __libc_start_main
  24: _start

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `3`,
 right: `4`', src/lib/kimyi.rs:435:16
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at /checkout/src/libstd/sys_common/backtrace.rs:68
             at /checkout/src/libstd/sys_common/backtrace.rs:57
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/panicking.rs:381
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:397
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:577
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:538
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:522
   7: lrpar::kimyi::apply_repairs
             at ./<panic macros>:7
   8: lrpar::kimyi_plus::rank_cnds::{{closure}}
             at ./src/lib/kimyi_plus.rs:257
   9: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &'a mut F>::call_once
             at /checkout/src/libcore/ops/function.rs:271
  10: <core::option::Option<T>>::map
             at /checkout/src/libcore/option.rs:398
  11: <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next
             at /checkout/src/libcore/iter/mod.rs:1251
  12: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::spec_extend
             at /checkout/src/liballoc/vec.rs:1843
  13: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter
             at /checkout/src/liballoc/vec.rs:1826
  14: <alloc::vec::Vec<T> as core::iter::traits::FromIterator<T>>::from_iter
             at /checkout/src/liballoc/vec.rs:1713
  15: core::iter::iterator::Iterator::collect
             at /checkout/src/libcore/iter/iterator.rs:1301
  16: lrpar::kimyi_plus::rank_cnds
             at ./src/lib/kimyi_plus.rs:255
  17: lrpar::kimyi_plus::recover
             at ./src/lib/kimyi_plus.rs:143
  18: <lrpar::parser::Parser<'a, TokId>>::lr
             at ./src/lib/parser.rs:165
  19: <lrpar::parser::Parser<'a, TokId>>::parse                                                                                                                                                                                                            
             at ./src/lib/parser.rs:111                                                                                                                                                                                                                    
  20: lrpar::parser::parse_rcvry
             at ./src/lib/parser.rs:313
  21: lrpar::main
             at src/main.rs:172
  22: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:101                                                                                                                                                                                                   
  23: std::rt::lang_start                                                                                                                                                                                                                                  
             at /checkout/src/libstd/panicking.rs:459                                                                                                                                                                                                      
             at /checkout/src/libstd/rt.rs:58                                                                                                                                                                                                              
  24: main
  25: __libc_start_main
  26: _start
ltratt commented 6 years ago

OK, so this is an error in (irony of ironies) the error recovery mechanism (which is highly non-deterministic internally). I'll look into it and repotr back -- thanks!

ltratt commented 6 years ago

OK, I've chopped down the example, and while I don't get the backtrace you get above, I get an even more troubling error: the parser completes even though it hasn't consumed all the input! This could be a fun one!

ltratt commented 6 years ago

I'm working my way through this one (I got distracted with another bug that isn't exactly related to this PR, but which it made sense to fix before I started on this).

FWIW, the API does have a way to alert you to tokens defined in the lexer that aren't referenced by the parser:

let (missing_from_lexer, missing_from_parser) = lexerdef.set_rule_ids(&rule_ids);

If either missing_from_lexer or missing_from_parser are Some, then something's missing in the lexer or parser. If you run your lexer and grammar through lrpar's binary you can see this:

$ cargo run /tmp/hang.l /tmp/hang.y /tmp/hang/lrpar
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/lrpar /tmp/hang.l /tmp/hang.y /tmp/hang`
Warning: these tokens are defined in the lexer but not referenced in the
grammar:
  PLUS

Of course, it still crashes after that point, which it really shouldn't ;) So hunting that is my next task.