osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
62 stars 7 forks source link

Return multiple tokens #47

Open MiSawa opened 2 years ago

MiSawa commented 2 years ago

Sometimes I want a lexer rule to be able to return multiple tokens, e.g. to emit a dummy token so parser can use it as an end-marker for some syntax. Maybe I should just use Lexer -> Vec<MyToken> and flatten it later, though it'd be great if this is supported by the library side.

osa1 commented 2 years ago

I needed this once, but I don't remember for what and how I worked around not having it.

We probably don't want to return a Vec in all semantic actions as it will incur runtime costs to lexers that don't need this. We could use SmallVec<[Token; 1]> to avoid allocation in majority of the cases, but even then the lexer main loop (the Iterator implementation) will have to store the returned (by semantic actions) vectors, and return the vector elements when there are tokens in the vector, and continue with lexing if it's empty. This means next() will be slower whether you need to return multiple tokens or not.

Alternatively, we could provide a compile-time switch for this feature and only do this in lexer that need it.