Closed Jakobeha closed 5 months ago
Hello @Jakobeha, thank you for this very comprehensive analysis!
But, if I understand well, then cloning Lexer
becomes virtually useless?
Cloning can be used as a kind of lookahead: the cloned lexer will return the exact same tokens that would be returned by the original lexer without advancing the original (provided the lexing functions are pure). One can also clone_from
to cloned lexer on the original to "commit" the peeked tokens.
I'm not sure if it's the most efficient way to implement a lookahead (I don't know the performance vs storing an array of spans and tokens and then making next
into a conditional), and source
being cloned in this method is redundant, but I'm sure in most cases it's fast enough to not matter.
The same thing (creating a "lookahead" lexer) can also be done using Lexer::with_extras(original.remainder(), original.extras)
, or to keep the same span for lexed tokens,
let mut clone = Lexer::with_extras(original.source(), original.extras);
clone.bump(original.span().end);
Though it's less convenient, especially if the lexer is wrapped in another data structure, because then Clone
on the outer structure has to be manually implemented.
Related, I also thought about Copy
on Lexer
, and personally I don't think it's a good idea, because it makes it easy to unintentionally copy the lexer instead of mutably borrowing it. Clone
doesn't have this issue because it's explicit.
This is a pretty good catch!
FWIW whenever the mythical rewrite happens that turns all the internal gotos from functions to match
branches over an enum in a loop
, the token
field can be replaced by just a variable on stack inside next
.
Lexer::clone
shouldn't clone the innerManuallyDrop
, because doing so clones the inner value, which is moved out inLexer::next
.This causes use-after-free if the lexer is cloned after the last-returned token is dropped, especially if the token contains an overridden implementation of
Clone
(such asRc
) that tries to read the dropped data.It causes a memory leak if the token contains a heap-allocated value, because cloning makes a new allocation. This allocation is in the
ManuallyDrop
and it's guaranteed to be overridden before the call toManuallyDrop::take
, so it's never freed.Another thing: https://github.com/maciejhirsz/logos/issues/263 (make
Lexer
implementCopy
) probably should be added (referencing here because it looks like the issue has been forgotten).