Open epage opened 2 years ago
combine
has a single trait with two distinct parse methods. This doesn't work well with automatically implementing the parse trait for functions because you don't know which will be involved.
What if we had
Parser
that used an IResult
with a CompletedErr
StreamParser
that used an IResult
with a StreamErr
This would
Err
usedWe'd need to have to_completed
and to_streamed
conversion functions on what is currently the Finish
trait to allow mixing parsers.
The following is a idea I have while doing my own parser. This is maybe not possible to do in nom or would require a lot of change but I don't see why not share ideas.
The idea was to not have streaming and complete parser like nom. I wanted to remove the problem of duplicate code in nom but I also wanted to have a way to make "streaming" parser like nom, currently nom have a major problem you can't "resume" where you stop. I saw two solutions:
Input
. That what I go for in my current experimentation. In nom the trait InputIter
this would allow to do mostly what I did.For example:
struct Buf<Reader, const N: usize> {
buf: Vec<u8>,
reader: Reader,
}
#[derive(Clone)]
pub(crate) enum Position {
RangeFrom(RangeFrom<usize>),
Range(Range<usize>),
}
#[derive(Clone)]
pub struct ReaderStream<Reader, const N: usize> {
buf: Rc<RefCell<Buf<Reader, N>>>,
position: Position,
}
ReaderStream
share the buffer with an Rc
(could have an Arc
version I guess), a ReaderStream
is easily clonable and have a position that represent a range in the buffer.
Buf
is responsable to grow when needed. (impl detail)
This design is quite, MUTABLE, but for the user it's hidden, then it has a lot of advantage, you don't resume when you need more data, you just read for more on the Reader. The Reader itself could be implemented to allow "sort of" async code, for example in tokio you could spawn a blocking thread for the parsing, that will wait the data read by an async task. This is not perfect but compared to resume from start the parsing it's quite nice. It's also remove the need of duplicate code and remove one branch in the outcome of a parser.
There is another problem, present in nom too, this buffer could become very large, that why I also think to a trait, while the implementation of this trait must be safe, this could allow bug in runtime if wrongly used, the purpose is simple, call the method if you have "finish" with previous data. Thus I still didn't work much on it:
pub trait Consume {
fn consume(self) -> Self;
}
For now it's basic I didn't try this yet. The point would be to flush the buffer, making any use of previously used data an error. Thus I don't think a lot of parser could make use such trait.
I've been playing with the Parser
/ StreamParser
idea.
Currently, almost all of the parsers are either FnMut(I) -> IResult<I, O, E>
or they return-type-impl that, rather than types implementing Parser<I, O, E>
. This ties them to one specific parser. For this to work, we'll need to switch the parsers provided by nom to return a type that impls Parser
like Parser::map
does. The parsers will need to have no knowledge of the Parser
trait and instead the returned type will impl the Parser
trait.
To put this in more concrete terms, we need to turn
pub fn tag<I, O, C, E: ParseError<(I, usize)>>(
pattern: O,
count: C,
) -> impl Fn((I, usize)) -> IResult<(I, usize), O, E>
where
I: Slice<RangeFrom<usize>> + InputIter<Item = u8> + InputLength + Clone,
C: ToUsize,
O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O> + PartialEq,
{
let count = count.to_usize();
move |input: (I, usize)| {
let inp = input.clone();
take(count)(input).and_then(|(i, o)| {
if pattern == o {
Ok((i, o))
} else {
Err(Err::Error(error_position!(inp, ErrorKind::TagBits)))
}
})
}
}
into
pub fn tag<O, C>(pattern: O, count: C) -> Tag<O, C> {
Tag { pattern, count }
}
/// Implementation of [`tag`]
pub struct Tag<O, C> {
pattern: O,
count: C,
}
impl<I, O, C, E> Parser<(I, usize), O, E> for Tag<O, C>
where
I: Slice<RangeFrom<usize>> + InputIter<Item = u8> + InputLength + Clone,
C: ToUsize,
O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O> + PartialEq,
E: ParseError<(I, usize)>,
{
fn parse(&mut self, input: (I, usize)) -> IResult<(I, usize), O, E> {
let count = self.count.to_usize();
let inp = input.clone();
take(count).parse(input).and_then(|(i, o)| {
if self.pattern == o {
Ok((i, o))
} else {
Err(Err::Error(error_position!(inp, ErrorKind::TagBits)))
}
})
}
}
With this change, Tag
will be free to impl StreamParser
and you have a single tag
for working with complete and streaming parsing.
The compiler is not always happy with the above conversion
Take bits
:
pub fn bits<I, O, E1, E2, P>(mut parser: P) -> impl FnMut(I) -> IResult<I, O, E2>
where
E1: ParseError<(I, usize)> + ErrorConvert<E2>,
E2: ParseError<I>,
I: Slice<RangeFrom<usize>>,
P: FnMut((I, usize)) -> IResult<(I, usize), O, E1>,
{
move |input: I| match parser((input, 0)) {
Ok(((rest, offset), result)) => {
// If the next byte has been partially read, it will be sliced away as well.
// The parser functions might already slice away all fully read bytes.
// That's why `offset / 8` isn't necessarily needed at all times.
let remaining_bytes_index = offset / 8 + if offset % 8 == 0 { 0 } else { 1 };
Ok((rest.slice(remaining_bytes_index..), result))
}
Err(Err::Incomplete(n)) => Err(Err::Incomplete(n.map(|u| u.get() / 8 + 1))),
Err(Err::Error(e)) => Err(Err::Error(e.convert())),
Err(Err::Failure(e)) => Err(Err::Failure(e.convert())),
}
}
When I transform this as described above, you get:
pub fn bits<P>(parser: P) -> Bits<P> {
Bits { parser }
}
/// Implementation of [`bits`]
pub struct Bits<P> {
parser: P,
}
impl<I, O, E1, E2, P> Parser<I, O, E2> for Bits<P>
where
E1: ParseError<(I, usize)> + ErrorConvert<E2>,
E2: ParseError<I>,
I: Slice<RangeFrom<usize>>,
P: Parser<(I, usize), O, E1>,
{
fn parse(&mut self, input: I) -> IResult<I, O, E2> {
match self.parser((input, 0)) {
Ok(((rest, offset), result)) => {
// If the next byte has been partially read, it will be sliced away as well.
// The parser functions might already slice away all fully read bytes.
// That's why `offset / 8` isn't necessarily needed at all times.
let remaining_bytes_index = offset / 8 + if offset % 8 == 0 { 0 } else { 1 };
Ok((rest.slice(remaining_bytes_index..), result))
}
Err(Err::Incomplete(n)) => Err(Err::Incomplete(n.map(|u| u.get() / 8 + 1))),
Err(Err::Error(e)) => Err(Err::Error(e.convert())),
Err(Err::Failure(e)) => Err(Err::Failure(e.convert())),
}
}
}
but this fails because of
$ cargo check
Checking nom v7.1.1 (/home/epage/src/personal/nom-experimental)
error[E0207]: the type parameter `E1` is not constrained by the impl trait, self type, or predicates
--> src/bits/mod.rs:50:12
|
50 | impl<I, O, E1, E2, P> Parser<I, O, E2> for Bits<P>
| ^^ unconstrained type parameter
For more information about this error, try `rustc --explain E0207`.
error: could not compile `nom` due to previous error
We cannot constrain E1
through the P
constraint.
Normally, this would be handled by adding PhantomData
to Bits
but the only way I see to do that is by making Bits
aware of the Parser
trait but that fails in our goal of being independent of the trait so that multiple *Parser
traits can be implemented.
Running rustc --explain E0207
, we get this:
Any type parameter of an `impl` must meet at least one of
the following criteria:
- it appears in the _implementing type_ of the impl, e.g. `impl<T> Foo<T>`
- for a trait impl, it appears in the _implemented trait_, e.g.
`impl<T> SomeTrait<T> for Foo`
- it is bound as an associated type, e.g. `impl<T, U> SomeTrait for T
where T: AnotherTrait<AssocType=U>`
While Parser<_,. E1>
doesn't seem to be sufficient, it sounds like Parser<I=_, O=_, E=E1>
might be. So I guess the next step is to test the feasibility of switching the generic parameters for Parser
to associated types. I believe I
at least needs to be generic.
Switching to associated types didn't cause any problems: epage/nom-experimental#5
Switching tag
(the working case) didn't cause too many problems: epage/nom-experimental#6
Still fighting bits
, the problematic case for which I switched to associated types
Thinking back through the latest problems I'm having and the first, I think I'm going to capture O
and E
as phantom data and I don't think I'll need associated types anymore.
Got it past the constraint issue: https://github.com/epage/nom-experimental/pull/6
So I've gotten further with epage/nom-experimental#6 and am looking at going with a different direction.
As a recap, my goals are
tag()
parsers without having to hard coder to only complete or streaming (#1408)With epage/nom-experimental#6, I'm trying to solve that by forking the Parser
trait into Parser
and StreamParser
. All of the existing "parsers" are switching from directly implementing Parser
or returning an opaque type that does to returning a hand-implemented type that will implement both Parser
and StreamParser
. Which trait you call into decides which approach is taken, simplifying the hierarchy, catching mix ups at compile time, and allowing &str
to implement both styles of parser.
Problems I've run into include:
Parser::map
and StreamParser::map
, etc will cause problems if both presentimpl
to hand written types with phantom data for several generic parametersParser
and StreamParser
. You can't delegate or you could call the wrong trait deeply nested down✅ The goals are being met.
Some things I think I like (not fully decided on all of them)
Overall, I think I should step back and evaluate this compared to what I think makes nom
great
Parser
trait, just write a function with understandable types (with IResult
doing a lot of heavy lifting)Parser
works directly with an input rather than just describing the parser via combinators / existing parserIn considering alternatives designs, something I keep coming back to is that complete vs stream is an attribute of the Input
. For example, if we could annotate the input with a enum Stream<Input> { Complete(Input), Incomplete(Input) }
then we can react accordingly.
&str
would add ceremonyComplete
and Incomplete
, this doesn't help make things type safe.As an alternative, what if we had:
pub trait Stream {
fn is_complete() -> bool;
}
pub struct Complete<Input>(Input);
// TODO: Also implement all relevant input traits
impl<Input> Stream for Complete<Input> {
#[inline]
fn is_complete() -> bool {
true
}
}
pub struct Incomplete<Input>(Input);
// TODO: Also implement all relevant input traits
impl<Input> Stream for Incomplete<Input> {
#[inline]
fn is_complete() -> bool {
false
}
}
// TODO: Also implement all relevant input traits
impl<Token> Stream for &[Token] {
#[inline]
fn is_complete() -> bool {
false
}
}
Now, each of the parsers that can be either complete or streaming will add an additional constraint (Input: Stream
) and switch on Input::is_complete()
. I am assuming that is_complete
will be inlined and the compiler will optimize this so it was as-if the code was only complete or streaming with no runtime overhead.
Evaluating this compared to what makes nom
great:
Compared to problems with epage/nom-experimental#6
Stream
will care about it while all of the other parts can be agnostic to itCompared to my goals
tag()
parsers without having to hard coder to only complete or streaming (#1408)Incomplete
, it will help catch errors at compile time but could still make mistakes with Err<E>
though extension traits like Finish
to automate the conversion could help
Stream
to define whether to use a CompleteErr<E>
or IncompleteErr<E>
but I can't think of a way to make that work while keeping things low-ceremony without specializationI think this strikes a balance worth exploring and I'll a crack at this next.
epage/nom-experimental#28 seems to be working well! I think it can be used to resolve this Issue and unblock other work (like implementing Parser
for types like &str
)
btw since I did my write up, I ended up going in the direction of making the InputIsStreaming
trait have a const: bool
parameter. This allows some extra error checking (e.g. finish
only working with complete parsers) and potential for some more interesting cases.
The most obvious downside is that it requires an extra trait parameter.
When implementing Parser
for char
, etc I also could only support complete parsing. I think the non-const approach would allow char
s to be treated as both complete and streaming parsers.
Right now,
nom
has distinct types for complete and streaming parsers.Problems with this
&str
because one of the two has to be offered (#1408)Finish::finish
will panic on IncompleteWhat if a type could implement have both types of parsing on it?