maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.93k stars 124 forks source link

Add a function-like procedural macro front-end processing a DSL #208

Open Krantz-XRF opened 3 years ago

Krantz-XRF commented 3 years ago

The current logos-derive use derive procedural macros and attribute annotations on a enum as a front-end. I know this design guarantees that the enum declaration is presented as-is, but I personally find regular expressions written as string literals hard to read and reason about:

I would prefer a DSL processed by a function-like procedural macro, like the following (the syntax could be revised, of course):

logos! {
  enum Lexeme {
    pub SomeKeyword = "keyword";
    pub Decimal = ('1' .. '9') $Digit*;
    IdStart = $Letter | '_' | '\'';
    IdCont = $Letter | $Digit | '_' | '\'';
    pub Id = IdStart IdCont*;
  }
}

Here $Digit $Letter should resolve to the corresponding Unicode character properties or general categories. entries without pub serves as sub-patterns, and entries with pub are exposed as enum variants of the token type.

I am currently working on this, with the new front-end targeting Mir. However, I don't know whether or not you would want to merge it. If you decide not, would you consider extracting the back-end code-generation logic in logos-derive to a non-procedural macro crate (so that it becomes a publicly available API for reuse)?

glyh commented 7 months ago

This is pretty useful for language reuse. I have something like this:

~[a-z]('([^'\\]|\\['\\])*'|(/([^/\\]|\\[/\\])*)?/([^/\\]|\\[/\\])*/)

To tokenize a limited subset of elixir's sigils. And it keeps growing 💀

glyh commented 7 months ago

Okay this can potentially be workarounded by pomsky, but it is blocked by https://github.com/rust-lang/rust/issues/52393