jeff-hykin / better-cpp-syntax

💾 The source of VS Code's C++ syntax highlighting
GNU General Public License v3.0
154 stars 29 forks source link

`/**/` comment criticism #564

Closed msftrncs closed 1 year ago

msftrncs commented 3 years ago

Checklist

The Criticism

The concern here is: (only speaking for the resulting tmLanguage file)

  1. block_comment and inline_comment are effectively identical and thus redundant, and block_comment could serve for all of the /**/ comment needs.
  2. inline_comment otherwise has room for performance improvement.
    • Currently: (\/\*)((?:[^\*]|(?:\*)++[^\/])*+((?:\*)++\/))
      • Utilizes 3 captures
      • Capture 3 requires a further pattern.
    • Improved: (\/\*)(?:[^*]++|\*+(?!\/))*+(\*\/)
      • Utilizes 2 captures plus the overall match
      • Requires no further pattern breakdown
      • On the test pattern /****** this is a ****** comment ******/ there is a 300% step reduction just on the match.
  3. Inter construct inline comment capturing is very inefficient, and is consuming a lot of lines in the tmLanguage file, as it double and triple scopes the captures, including using inline_comment.
    • This uses the exact same regex at its heart.
    • In all the cases I have seen, the entire match is further patterned by inline_comment, yet three original captures are further scoped in the construct's captures, literally identically to inline_comment, a total of 18 lines in the tmLanguage file per instance, and there appears to be approximately 377 instances, or nearly 6800 total lines of a 20,000 line document.
    • Because some constructs have repeating quantifiers wrapping the inline comment expression, the captures cannot be correctly sub-scoped anyway.
jeff-hykin commented 3 years ago

@msftrncs Glad to see someone who knows their regex!

I've been working on a cleaned up version of this repo over here if you want to refer to it. Eventually I'll merge it into this one.

  1. Sadly block_comment and inline_comment are both needed and are different. Just try removing one and you'll probably be able to see the pattern. Inline comment is should be faster than block, but it only works if its one line. Block is the only one capable of multi-line but because of that it can't be embedded into other single line patterns, like the std_space pattern.

  2. I'll have to spend a bit more time looking at the improved version. Only numbered capture groups are allowed in Textmate, so the library has to know about capture groups in order to correctly line up them up with the correct textmate scope when they're embedded into other pattterns. This means it might be hard to get that kind of low level optimization, and the $0'th capture group might not be present when the pattern is embedded in another pattern.

  3. I'm not exactly sure what you mean by "Inter construct inline comment capturing", or what you're recommending. If those inline comments are not injected 377 times, the grammar can't parse correctly as far as I understand. Textmate is limited and aggressive copy/pasting patterns everywhere is often the only what to get correct parsing. I created the ruby library to specifically do that kind of agressive copy-pasting in a maintainable way.

jeff-hykin commented 1 year ago

I changed the inline comment regex to your version 👍