ptal / oak

A typed parser generator embedded in Rust code for Parsing Expression Grammars
Apache License 2.0
142 stars 14 forks source link

Seamingly odd "error: Type mismatch between branches of the choice operator." #102

Closed tstorch closed 3 years ago

tstorch commented 7 years ago

I just started using oak. Probably I'm doing something wrong, but the following struck me as odd:

Code:

#![feature(plugin)]
#![plugin(oak)]

extern crate oak_runtime;
use oak_runtime::*;

grammar! my_grammar{
  quotation_mark = dquote / ("%22" > char_quotation_mark)
  dquote = ["\""] // %x22

  fn char_quotation_mark() -> char { '\"' }
}

fn main() {
  let state = my_grammar::parse_quotation_mark("%22".into_state());
  println!("{:?}", state.unwrap_data());
}

Output from from running cargo run

   Compiling oak_test v0.1.0 (file:///home/tstorch/src/oak_test)
error: Type mismatch between branches of the choice operator.
 --> src/main.rs:8:20
  |
8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
note: char
 --> src/main.rs:8:20
  |
8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
  |                    ^^^^^^
note: char
 --> src/main.rs:8:36
  |
8 |   quotation_mark = dquote / ("%22" > char_quotation_mark)
  |                                    ^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

error: Could not compile `oak_test`.

As both sides are char. This leaves me wondering a bit. Is this a bug or intentional behavior?

cargo and rust version:

> rustc --version
rustc 1.19.0-nightly (5b13bff52 2017-05-23)
> cargo --version
cargo 0.20.0-nightly (397359840 2017-05-18)

Update: If I do this it works "just fine":

#![feature(plugin)]
#![plugin(oak)]

extern crate oak_runtime;
use oak_runtime::*;

grammar! sum{
  quotation_mark = dquote / dquote_encoded
  dquote = "\"" > char_quotation_mark
  dquote_encoded = "%22" > char_quotation_mark

  fn char_quotation_mark() -> char { '\"' }
}

fn main() {
  let state = sum::parse_quotation_mark("%22".into_state());
  println!("{:?}", state.unwrap_data());
}

Somehow the bracket notation does not work well with functions returning a char... My current workaround is doing something like this in order to circumvent this:

a_to_f = ["A-F"] > char_id
fn char_id(c: char) -> char { c }
ptal commented 7 years ago

Hello! Thank you for reporting the bug and playing with Oak. Actually, it is due to the barrier between "Oak types" and "Rust types" where the type of an Oak expression such as ["\""] is never the same as the type returned by a Rust function... I should add a better type comparison where the char from Oak (which is Type::Atom) and the char from Rust (which is Type::Action(FunctionRetTy)) are the same. Though, implementing this feature might not that obvious because the underlying stream parsed by Oak might not always be char but, for example, a token type or byte.

So the "solution" for now would be to encapsulate the type of ["\""] with a function, just as you did.

tstorch commented 7 years ago

Thanks for the answer. I guess the workaround should have no performance impact if declared inline as the compiler could optimize it away (any experience with this?) As I really like oak I hope you keep up the good work!

ptal commented 7 years ago

I don't think the performance will be a problem but if you really need efficiency or needs to parse gigabytes of data, I'm not sure Oak is suited. For now, it's conceived to parse text mostly. I wish I had more time to work on it but I will quite busy in the next few months :-)

tstorch commented 7 years ago

Performance is at this point no issue. This question was just our of pure interest...

ptal commented 3 years ago

I looked into this issue. It is possible to improve the situation by converting Rust return types to Oak return types, e.g., Vec<char> to List<Atom>. However, we must do that generically since the underlying stream is not necessarily on char types. A solution is to analyze the type of the underlying stream. Anyway, I think the time/benefits ratio is not high enough right now, we'll see in the future if it's interesting, I'm closing this issue for now.