Open Plecra opened 3 years ago
Right now, the lexer is very inefficient, as it doesn't use a state machine. I think we should refactor it; this should be a viable optimization.
As for pulling a crate like https://docs.rs/memchr, Passerine's core depends only on Rust's standard library and has no external dependencies. For this reason, if we were to do vectorization, I suggest we extract the core concept of the library to vectorize as needed.
Thanks for pointing this out! much appreciated :D
@Plecra I recently completely redid the lexer in big-refactor
. Would you mind taking a look at it and letting me know what style/performance improvements can be made? I suspect there's something with the way we're using peekable iterators that could be improved:
It's awfully late, but I just saw this in my email, and I like it 😆. The token representation and parsing logic all flows very nicely. If parsing ever becomes a bottleneck, passerine could return to the once/chain/peekable
pattern, since rustc will chug on it a bit.
https://github.com/vrtbl/passerine/blob/3796465ae108bba8b3a66f5c3ab71709a76b3fc8/src/compiler/lex.rs#L152-L158
UTF-8's ability to represent ASCII in-place allows lexers for ASCII characters to be implemented extra-efficiently by skipping decoding of utf-8 bytes. It's fine to simply scan directly through the bytes (and a crate like https://docs.rs/memchr can vectorize this) for characters like
' '
and'\t'