Open shnewto opened 5 years ago
I would also be very interested in this for Askama!
For anyone interested, I worked my way through upgrading bnf, the changes are here. Here's what that logic above ended up looking like:
pub fn prod_lhs<'a>(input: &'a str) -> IResult<&'a str, Term, VerboseError<&'a str>> {
let (input, nt) = delimited(
complete::char('<'),
take_until(">"),
terminated(complete::char('>'), complete::multispace0),
)(input)?;
let (input, _) = preceded(
complete::multispace0,
terminated(tag("::="), complete::multispace0),
)(input)?;
Ok((input, Term::Nonterminal(nt.to_string())))
}
As far as that ws!
macro, my solution ended up pretty verbose. What was previously ws!(tag!("::="))
ended up as:
preceded(
complete::multispace0,
terminated(tag("::="), complete::multispace0)
(Watch out for the difference between multispace0
and multispace1
, zero or more whitespace characters vs 1 or more whitespace characters.)
One thing I bumped into was that it looks like there's no function equivalent to the eof!
macro so I ended up implementing my own as eoi
(end of input).
pub fn eoi<I: Copy + InputLength, E: ParseError<I>>(input: I) -> IResult<I, I, E> {
if input.input_len() == 0 {
Ok((input, input))
} else {
Err(nom::Err::Error(E::from_error_kind(input, ErrorKind::Eof)))
}
}
Certainly interested in any feedback on what I've presented here if anything seems off.
The JSON and S-expression examples both explicitly add whitespace parsers everywhere. :-/
I ended up writing this thing, I'm not sure why it's not included in nom:
fn ws<F, I, O, E>(inner: F) -> impl Fn(I) -> IResult<I, O, E>
where
F: Fn(I) -> IResult<I, O, E>,
I: InputTake + Clone + PartialEq + for<'a> Compare<&'a [u8; 1]>,
E: ParseError<I>,
{
move |i: I| {
let i = alt::<_, _, (), _>((tag(b" "), tag(b"\t")))(i.clone())
.map(|(i, _)| i)
.unwrap_or(i);
let (i, res) = inner(i)?;
let i = alt::<_, _, (), _>((tag(b" "), tag(b"\t")))(i.clone())
.map(|(i, _)| i)
.unwrap_or(i);
Ok((i, res))
}
}
It can probably be simplified/improved somehow...
Don't you need a repeat
in there somewhere? Probably you can make it simpler like this:
delimited(multispace0, inner, multispace0)
Throw some punctuation soup over that.
Also does it lead to problems allowing whitespace before and after every parser? I mean if you have foo bar
it now has an ambiguous parse:
("foo"), ((" ", " "), "bar")
("foo", " "), (" ", "bar")
("foo", (" ", " ")), ("bar")
Maybe that doesn't matter. For now I've ended up using tuples everywhere....
map(
tuple(
tag("struct"),
ws,
identifier,
ws,
tag("{")
ws,
...
),
|_, _, name, _, _, ....| {
Ast::Struct {
name,
...
}
)
It's not particularly pretty but I don't think it is too bad either. I also have a rule that all items allow trailing whitespace, but not preceding, and then I have a single preceding ws
for the entire file. That voids duplicate ws
parsers.
My plan to handle comments is to put that logic all in ws
, so it consumes all whitespace and comments.
Here is my whitespace parser that supports //
-style comments.
fn ws<'a>(input: &'a str) -> IResult<&'a str, &'a str, VerboseError<&'a str>> {
let mut in_comment = false;
let mut chars = input.char_indices();
while let Some((i, c)) = chars.next() {
if in_comment {
if c == '\n' {
in_comment = false;
}
} else {
match c {
' ' | '\t' | '\r' | '\n' => {},
'/' => {
// Might be a comment - check the next character.
match chars.next() {
Some((_, '/')) => {
in_comment = true;
},
_ => {
return Ok((&input[i..], &input[..i]))
}
}
},
_ => {
return Ok((&input[i..], &input[..i]))
}
}
}
}
Ok((&input[0..0], &input))
}
fn ws1<'a>(i: &'a str) -> IResult<&'a str, &'a str, VerboseError<&'a str>> {
match ws(i) {
Ok((_, whitespace)) if whitespace.is_empty() => {
Err(Err::Error(VerboseError::from_char(i, ' ')))
},
err => err,
}
}
And here's how I use it to parse a struct
(in a custom language):
fn parse_struct<'a>(i: &'a str) -> IResult<&'a str, Item<'a>, VerboseError<&'a str>> {
map(
context(
"parsing struct",
preceded(
tag("struct"),
cut(
tuple((
ws1,
identifier,
ws,
tag("{"),
many0(
preceded(
ws,
field,
),
),
ws,
tag("}"),
)),
),
),
),
|(_, name, _, _, fields, _, _)| {
Item::Struct {
name,
fields,
}
},
)(i)
}
By the way, &input[0..0]
seems a bit silly to me but I'm not enough of a Rust pro to know if there is something better (and technically it should be &input[input.len()..]
or something like that).
Also note that it doesn't differentiate between whitespace and comments. If you actually want to use the comments (e.g. for syntax highlighting, or code generation) then you'll need something like this:
preceded(
multispace0,
many0(
terminated(
comment,
multispace0,
)
)
)
Where comment
is
preceded(
tag("//"),
take_while(|c| c != '\n'),
)
That may be simpler anyway. I haven't tested it though.
I'm a rust noob but the below seems to be working for me. I'm sharing it since this thread amongst other places helped me to get to it and it might be useful for others. The terminated(many0(), whitespace)
function call discards trailing whitespace for the whole input (ie spaces at the end of the line). The preceeded(whitespace, alt())
call discards leading whitespace before each individual token.
let tokenizer =
all_consuming(
complete(
terminated(
many0(
preceded(
whitespace,
alt((
gstring,
integer,
float,
operator,
keyword,
))
)
),
whitespace,
)
)
);
In my application, the input is all u8
bytes and the only whitespace is the space character so this is my whitespace function:
fn whitespace(input: &[u8]) -> IResult<&[u8], &[u8]> {
take_while(|b| b == ' ' as u8)(input)
}
FWIW, in Askama I ended up with this thing:
fn ws<F, I, O, E>(inner: F) -> impl Fn(I) -> IResult<I, O, E>
where
F: Fn(I) -> IResult<I, O, E>,
I: InputTake + Clone + PartialEq + for<'a> Compare<&'a [u8; 1]>,
E: ParseError<I>,
{
move |i: I| {
let ws = many0(alt::<_, _, (), _>((
tag(b" "),
tag(b"\t"),
tag(b"\r"),
tag(b"\n"),
)));
let i = ws(i.clone()).map(|(i, _)| i).unwrap_or(i);
let (i, res) = inner(i)?;
let i = ws(i.clone()).map(|(i, _)| i).unwrap_or(i);
Ok((i, res))
}
}
(It could perhaps alternatively take a slice for the whitespace characters.)
👋 Hello!
Raising an issue to mostly ask for guidance. I've got a crate that's using nom 3.2 functionality for the most part and I'd like to update it to properly use nom 5 (it's just technically compatible with nom 5 atm). I'm particularly motivated by a lot of warnings about the deprecation of the
ws!
macro. But I'm having trouble locating documentation specific to converting existing / outdated macros to nom 5 functions and what I can use to replace thews!
macro. Any pointers to docs on that sort of thing would be really appreciated.For context, the crate is called bnf. Here's an example of the kind of thing I'm looking to update:
Thanks!