Open m4rw3r opened 8 years ago
Notes:
Removing Copy
requirement requires either a Clone
or peek
to return a reference.
The latter prevents us from using &str
as an input since the character iterator provides us with char
which we cannot return as a reference since we have nothing to borrow from.
The former is also not a fantastic idea, since this would cause clone
to be called for anything using peek
(which includes satisfy
, satisfy_with
, token
, not_token
, peek
, peek_next
, eof
). Plus of course the Clone
requirement itself.
ascii
module predicates and any predicates operating on actual tokens (very common when working with small token-types, like u8
and char
) does not work as a slot-in for any predicate, they must operate on references or the value has to be dereferenced before calling the predicate:
take_while(i, |&c| is_whitespace(c))
satisfy(i, |c| *c == b'a')
// vs
take_while(i, is_whitespace)
satisfy(i, |c| c == b'a')
If we do not remove the Copy
requirement, multi-stage parsers have to use the requirement B: Copy + Buffer
to work properly which imposes restrictions on the underlying Input
. Also possible is to perform the conversion of a buffer into the actual type in the lexer, but this might be a bit too early in the parsing-chain for some formats.¨
Getting some feedback on such a fundamental change as this would be great.
More:
Lifetimes are an issue too here, since borrowing a token prevents the following parsers from parsing the input until the reference has been dropped. Eg. if Token
isn't Copy
things like this would be forbidden:
peek_next(i).then(|i, c| if c == Foo { any(i) } else { i.ret(Bar) } )
Which is a common pattern internally (even if it is just inlined). This means that any peek needs to return by value, necessitating at least Clone
.
Copy
bound on a returned buffer prevents pretty much all container-types from being used in tokens in a multi-stage parser. Referring to slices of the input is feasible, but that satisfies Buffer + Copy
while restricting the resulting parser to only work on slices.
As the title says, any way we can remove the
Copy
requirement on theToken
associated type?The main drawback of removing it is that most filter-functions will need to operate on borrows (
satisfy
,take_while
and so on), which will introduce some extra typing (one or two extra&
or*
per filter).The benefit of removing
Copy
would be that we can easily do multi-stage parsing (commonly lexer + parser in separate parts) solely in Chomp, relying on the previous parser to construct a new token stream, and this could then refer to the initially parsed data all the way through (depending of course on lifetime requirements of the source itself).This would be a breaking change, probably
1.0.0
and/or0.4.0
.