Closed ptal closed 9 years ago
In nom (explanations taken from the nom README):
A parser combinator in Rust is basically a function which, for an input type I and an output type O, will have the following signature:
fn parser(input: I) -> IResult<I, O>;
IResult
is an enumeration that can represent:
Done(I,O)
with the first element being the rest of the input (not parsed yet), and the second being the output valueError(Err)
with Err being an integerIncomplete(u32)
indicating that more input is necessary (for now the value is ignored, but it should indicate how much is needed)pub enum IResult<I,O> {
Done(I,O),
Error(Err),
Incomplete(u32)
}
pub enum Err<'a> {
Code(u32),
Node(u32, Box<Err<'a>>),
Position(u32, &'a [u8]),
NodePosition(u32, &'a [u8], Box<Err<'a>>),
}
In rust-peg:
struct ParseState<'input> {
max_err_pos: usize,
expected: ::std::collections::HashSet<&'static str>,
_phantom: ::std::marker::PhantomData<&'input ()>,
$cache_fields
}
$cache_fields
contains a std::collections::HashMap<usize, RuleResult<{}>>
for each cached rules. With RuleResult
being:
enum RuleResult<T> {
Matched(usize, T),
Failed,
}
Parsing functions have this interface:
fn $name<'input>(input: &'input str, state: &mut ParseState<'input>, pos: usize) -> RuleResult<$ret>
Basically, error are reported using the farthest match failure technique. It uses leaf expression for naming the thing that was expected. For example: expected <character>
or expected [a-Z-]
or expected "a_keyword"
.
In combine:
///Enum holding error information
///As there is implementations of `From` for `T: Positioner`, `String` and `&'static str` the
///constructor need not be used directly as calling `msg.into()` should turn a message into the
///correct `Info` variant
#[derive(Clone, Debug)]
pub enum Info<T, R> {
Token(T),
Range(R),
Owned(String),
Borrowed(&'static str)
}
///Enum used to store information about an error that has occured
#[derive(Debug)]
pub enum Error<T, R> {
///Error indicating an unexpected token has been encountered in the stream
Unexpected(Info<T, R>),
///Error indicating that the parser expected something else
Expected(Info<T, R>),
///Generic message
Message(Info<T, R>),
///Variant for containing other types of errors
Other(Box<StdError+Send>)
}
///Enum used to indicate if a parser consumed any items of the stream it was given as an input
#[derive(Clone, PartialEq, Debug, Copy)]
pub enum Consumed<T> {
///Constructor indicating that the parser has consumed elements
Consumed(T),
///Constructor indicating that the parser did not consume any elements
Empty(T)
}
///Struct which hold information about an error that occured at a specific position.
///Can hold multiple instances of `Error` if more that one error occured at the position.
pub struct ParseError<P: Stream> {
///The position where the error occured
pub position: <P::Item as Positioner>::Position,
///A vector containing specific information on what errors occured at `position`
pub errors: Vec<Error<P::Item, P::Range>>
}
///The `State<I>` struct keeps track of the current position in the stream `I`
#[derive(Clone, PartialEq)]
pub struct State<I>
where I: Stream {
pub position: <I::Item as Positioner>::Position,
pub input: I
}
///A stream is a sequence of items that can be extracted one by one
pub trait Stream : Clone {
type Item: Positioner + Clone;
type Range: Positioner + Clone;
///Takes a stream and removes its first item, yielding the item and the rest of the elements
///Returns `Err` when no more elements could be retrieved
fn uncons(self) -> Result<(Self::Item, Self), Error<Self::Item, Self::Range>>;
}
impl <'a> Stream for &'a str {
type Item = char;
type Range = &'a str;
fn uncons(self) -> Result<(char, &'a str), Error<char, &'a str>> {
match self.chars().next() {
Some(c) => Ok((c, &self[c.len_utf8()..])),
None => Err(Error::end_of_input())
}
}
}
///A type alias over the specific `Result` type used by parsers to indicate wether they were
///successful or not.
///`O` is the type that is output on success
///`I` is the specific stream type used in the parser
///`T` is the item type of `I`, this parameter will be removed once type declarations are allowed
///to have trait bounds
pub type ParseResult<O, I> = Result<(O, Consumed<State<I>>), Consumed<ParseError<I>>>;
Hi! If you need more insights into nom's design, please ask me :)
The Incomplete
part of IResult
now contains this:
pub enum Needed {
Unknown,
Size(usize)
}
Some constructions around nom use these for streaming parsers.
Hi! Thank you for your message, I will ask you for information on IRC. BTW, I was not aware that a link could send a notification to repositories owner, I am sorry about that!
Interface changed to stream, convenient methods and documentation on ParseState
have been added to ease the interactions between generated code and user code.
Instead of:
We should propose a version without the
0
offset parameter: