ptal / oak

A typed parser generator embedded in Rust code for Parsing Expression Grammars
Apache License 2.0
142 stars 14 forks source link

Better API interface of top-level functions and `ParseResult` #64

Closed ptal closed 9 years ago

ptal commented 9 years ago

Instead of:

 assert_eq!(calculator::parse_expression("7+(7*2)", 0).unwrap().data, 21);

We should propose a version without the 0 offset parameter:

assert_eq!(calculator::parse_expression("7+(7*2)").unwrap().data, 21);
ptal commented 9 years ago

In nom (explanations taken from the nom README):

A parser combinator in Rust is basically a function which, for an input type I and an output type O, will have the following signature:

fn parser(input: I) -> IResult<I, O>;

IResult is an enumeration that can represent:

pub enum IResult<I,O> {
  Done(I,O),
  Error(Err),
  Incomplete(u32)
}

pub enum Err<'a> {
    Code(u32),
    Node(u32, Box<Err<'a>>),
    Position(u32, &'a [u8]),
    NodePosition(u32, &'a [u8], Box<Err<'a>>),
}
ptal commented 9 years ago

In rust-peg:

struct ParseState<'input> {
  max_err_pos: usize,
  expected: ::std::collections::HashSet<&'static str>,
  _phantom: ::std::marker::PhantomData<&'input ()>,
  $cache_fields
}

$cache_fields contains a std::collections::HashMap<usize, RuleResult<{}>> for each cached rules. With RuleResult being:

enum RuleResult<T> {
  Matched(usize, T),
  Failed,
}

Parsing functions have this interface:

fn $name<'input>(input: &'input str, state: &mut ParseState<'input>, pos: usize) -> RuleResult<$ret>

Basically, error are reported using the farthest match failure technique. It uses leaf expression for naming the thing that was expected. For example: expected <character> or expected [a-Z-] or expected "a_keyword".

ptal commented 9 years ago

In combine:

///Enum holding error information
///As there is implementations of `From` for `T: Positioner`, `String` and `&'static str` the
///constructor need not be used directly as calling `msg.into()` should turn a message into the
///correct `Info` variant
#[derive(Clone, Debug)]
pub enum Info<T, R> {
    Token(T),
    Range(R),
    Owned(String),
    Borrowed(&'static str)
}

///Enum used to store information about an error that has occured
#[derive(Debug)]
pub enum Error<T, R> {
    ///Error indicating an unexpected token has been encountered in the stream
    Unexpected(Info<T, R>),
    ///Error indicating that the parser expected something else
    Expected(Info<T, R>),
    ///Generic message
    Message(Info<T, R>),
    ///Variant for containing other types of errors
    Other(Box<StdError+Send>)
}

///Enum used to indicate if a parser consumed any items of the stream it was given as an input
#[derive(Clone, PartialEq, Debug, Copy)]
pub enum Consumed<T> {
    ///Constructor indicating that the parser has consumed elements
    Consumed(T),
    ///Constructor indicating that the parser did not consume any elements
    Empty(T)
}

///Struct which hold information about an error that occured at a specific position.
///Can hold multiple instances of `Error` if more that one error occured at the position.
pub struct ParseError<P: Stream> {
    ///The position where the error occured
    pub position: <P::Item as Positioner>::Position,
    ///A vector containing specific information on what errors occured at `position`
    pub errors: Vec<Error<P::Item, P::Range>>
}

///The `State<I>` struct keeps track of the current position in the stream `I`
#[derive(Clone, PartialEq)]
pub struct State<I>
    where I: Stream {
    pub position: <I::Item as Positioner>::Position,
    pub input: I
}

///A stream is a sequence of items that can be extracted one by one
pub trait Stream : Clone {
    type Item: Positioner + Clone;
    type Range: Positioner + Clone;
    ///Takes a stream and removes its first item, yielding the item and the rest of the elements
    ///Returns `Err` when no more elements could be retrieved
    fn uncons(self) -> Result<(Self::Item, Self), Error<Self::Item, Self::Range>>;
}

impl <'a> Stream for &'a str {
    type Item = char;
    type Range = &'a str;
    fn uncons(self) -> Result<(char, &'a str), Error<char, &'a str>> {
        match self.chars().next() {
            Some(c) => Ok((c, &self[c.len_utf8()..])),
            None => Err(Error::end_of_input())
        }
    }
}

///A type alias over the specific `Result` type used by parsers to indicate wether they were
///successful or not.
///`O` is the type that is output on success
///`I` is the specific stream type used in the parser
///`T` is the item type of `I`, this parameter will be removed once type declarations are allowed
///to have trait bounds
pub type ParseResult<O, I> = Result<(O, Consumed<State<I>>), Consumed<ParseError<I>>>;
Geal commented 9 years ago

Hi! If you need more insights into nom's design, please ask me :)

The Incomplete part of IResult now contains this:

pub enum Needed {
  Unknown,
  Size(usize)
}

Some constructions around nom use these for streaming parsers.

ptal commented 9 years ago

Hi! Thank you for your message, I will ask you for information on IRC. BTW, I was not aware that a link could send a notification to repositories owner, I am sorry about that!

ptal commented 9 years ago

Interface changed to stream, convenient methods and documentation on ParseState have been added to ease the interactions between generated code and user code.