skvadrik / re2c

Lexer generator for C, C++, D, Go, Haskell, Java, JS, OCaml, Python, Rust, V and Zig.
https://re2c.org
Other
1.11k stars 173 forks source link

Support different "comment" conventions for different languages. #474

Closed pmetzger closed 3 days ago

pmetzger commented 7 months ago

re2c is in the midst of acquiring a really flexible method for adding new language support thanks to its amazing maintainers. (Thank you, Skvadrik!)

Currently, even though there may be many different types of commenting conventions used in different languages, input files for a given language seem to usually follow a C/C++-like comment convention for the place to stash the re2c directives. This may not really be the case (I haven't delved deeply into the code) and I might be mistaken, but presuming I'm not: it would be nice for the new language support configuration to allow for a native comment convention to be used. I note, for example, that some languages might actually use C-like comments as operators (say, Python's integer division operator, //).

Alternatively, perhaps The Right Thing is indeed to use a convention distinct from the language's native comments (after all, these are not comments but are in fact directives to re2c to generate code!), but in that case, maybe it would be better to use another, prettier convention less likely to cause accidents?

skvadrik commented 7 months ago

This may be a good idea. My main concern is correctness of the re2c parser wrt. different languages. E.g. it has to parse string literals and character literals (as they may contain comment start), but these literals have slightly different syntax in different languages. See also related https://github.com/skvadrik/re2c/issues/429 which has a wider scope.

skvadrik commented 3 days ago

This has been resolved by extending language-insensitive block start/end markers %{... and %} to cover all block types.