Closed jeff-hykin closed 5 years ago
Sorry I'm not more active. I'm working finishing up some other responsibilities before the next semester starts so it might be another week.
Saturday will prob be the next time I get to review branches and look at the other issues.
Doxygen and peaceshi's recent request for more pre-defined Microsoft macros, I think we should come up with an official list of the standards and C++ versions we support, and their priority if they ever conflict along with URLs to their specs. (C++ 2020, GNU, Microsoft, doxygen, emac's banners, etc)
So I'm thinking something like.
<=>
due to std::set<Foo,Foo::operator<=>
I'll try to find example issues/ references for each in a bit.
I'll try to catch up on #368 and #339 tomorrow (edit: make that #359 as well). I probably should've focused on #339 instead of the LaTeX branch
For the gem
, do you @matter123 have any suggestions for how to incorporate the TokenHelper functionality?
I was thinking that the keywords and token names could simply be embedded in the pattern, but there is a challenge when a separate pattern needs to be anything except tokens with specific qualities. We can add the TokenHelper as-is to the gem/lib, but I wanted to see if you had any ideas for a better solution. I'm going to change the functionality of the TokenHelper to not pollute the global namespace (it is currently changing the negation operator for Symbol)
TokenHelper is certainly a good idea, but I'm not sure it's actually related to grammer generation. I'm thinking it should be a separate gem.
Well I do think it would be good to have one gem as an all-in-one toolbox for making a textmate grammar. I'm pretty sure most grammars have keywords that are going to need to be negatively searched for in multiple places. The regex helpers pattnern.or(pattern)
technically aren't really part of the grammar object either and I've though about making it into a separate regex library (unrelated to textmate) but I think that should say bundled as well. We could make a modular system: grammar gem, pattern/regex gem, token-helper gem, testing-gem and then a textmate_toolbox bundled combining all four.
What would you think if the tokens were actually more integrated by saving them on the grammar object?
Also I thought I'd let you know, I've been working on some tutorial blog posts to publish with the release of the gem to make it easier for people to contribute. It's prompted a decent number of design decisions/problems, so I created a better-c-syntax
repo as a WIP place for how an ideal repo would be setup using the library. Right now it is very barebones.
I've been playing with git submodules, and I think they might be the best way to handle multi-language patterns. So for example the better-c-syntax
has a imports
folder where is has this (cpp-textmate-grammar
) repo as a submodule. That way, it can import functions from a specific version of the C++ repo and changes that happen on the C++ repo don't have to affect C. Whenever desired, the C repo can easily pull the latest changes from C++ though if there are bugfixes. The imports don't have to be languages either, we could create a shared pattern repo that both C++ and C could pull from.
I'm not actually sure how well the grammar can be separated from the patterns. The grammar and it's plugins rely a lot in functions of the patterns. By testing gem do you mean the integration or unit tests. The integration tests kind of have to be JavaScript as as far as I know, the only open source textmate grammar parsers are in JavaScript (vscode-textmate, firstmate). Making the integration tester a separate npm package sounds reasonable, there are only a few grammar testers and iirc none are automated.
The grammar storing tokens sounds like a good idea.
I'll check out better-c-syntax.
Sounds good, I'll work on refining/simplifying the token helper and moving it into the grammar. It might be a good idea for the tokens to be created near their relevant code and imported to other pieces that need it so I'll consider that.
By testing I did mean integration, and yes I agree it will have to be javascript. However, there are some options. The ruby gem prompt the user to install the javascript package through npm, but even more simple than that, there's a decent number of ways to package Javascript into a cross platform executable that could be shipped with the ruby gem.
I realize I should maybe clarify, the pattern-matching of the better-c-syntax is what is barebones but the folder structure has most all of the pieces (although I'm still very uncertain that it is the best format). I feel that, whatever structure is decided upon might last for a long time so I hope to make it future-proofed and extendable. If languages are going to be importing other languages, it will be quite useful to have a standard folder structure.
Sorry I've been gone for awhile @matter123. I'll probably back some this next week for merging and catching up on everything.
I came up with a way to integrate the TokenHelper last night. (Current code is broken but I'll push the local fix later today).
Instead of tokens being separate objects: I figured lets just add the functionality into Pattern objects. Patterns can have adjectives that allow them to be collected in groups.
grammar[:addition_operator] = Pattern.new(
match: /\+/,
adjectives: [ :operator, :arithmeticOperator ],
)
To retrieve/collect "tokens" I created an abbreviated syntax. I'm a little unsure that this is the best way, but I couldn't find different concise way to do it without hacking into the existing built in classes.
all_arithmetic_operators = grammar[/arithmeticOperator/]
all_other_operators = grammar[/operator && !arithmeticOperator/]
It uses placeholders so that the tokens are resolved at the very end, and because of that (and because it's global) it only searches through patterns that are directly named on the grammar repository. I'm not sure if that will be unintuitive though, so that behavior might need to be changed.
I'll probably post something later today going into more detail.
@matter123 I could use some help on the oniguruma engine.
The replacement system for tokens is almost done. I was able to use regex conditionals to implement an awesome feature dont_match:
that will replace a lot of the negative lookbehinds. The regex conditionals work in VS Code itself, but they're failing with npm run gen
.
I updated oniguruma and vscode-textmate but it still fails. I'm a bit confused since I thought that was the same engine VS Code itself was using.
So vscode just (a few weeks ago) switch from oniguruma to oniguasm-wasm for their regex engine, this updated the version of the engine that is being used. Im pushing the branch Misc/update/vscode-textmate with an initial update of that.
I'm going to dinner but I can work some more in about an hour.
Cool! I'm glad you knew what was up. Take your time with dinner 👍
@jeff-hykin I just pushed the minimal amount of work needed to make npm run gen
work.
Of note, npm run perf
is no longer functional as the vscode-oniguruma
module does not export OnigRegExp
needed by test/source/report/onig_scanner.js
. The solution to this is to pin the version of https://github.com/NeekSandhu/onigasm to be the same as what vscode-oniguruma
uses internally.
Thanks! That did the trick for Misc/tokens
, it all generates fine now. I'm glad you merged it too cause I had been attempting unsuccessfully to get it working based off that repo (https://github.com/NeekSandhu/onigasm).
Thanks for the note about perf too.
Quickly looked at the Multi/tokens
branch, Looks Good. I left a few comments, but I'll give a more thorough code review when the PR is made.
The most recent push to Misc/update/vscode-textmate
restores the ability to run npm run perf
, I decided against using onigasm
as their embedded version of onigurua
is 6.8.2
while vscode-onigurua
's embedded version is 6.9.5
. Instead I utilized the ace that that OnigScanner
with a single pattern acts very similarly to OnigRegExp
.
The new version is however noticeably slower (~18.3s for previous, ~23s for new), I am unsure what the cause is.
Quickly looked at the Multi/tokens branch, Looks Good. I left a few comments, but I'll give a more thorough code review when the PR is made.
Sounds good, the whole branch is definitely a big WIP. You might want to look at the latest cpp/generate.rb
to see the usage of the new selector system and if there's any design concerns. I ran into some small design problems, but overall I think its really helping cleanup the code. Outside of that, I'll probably have --Jeff
in the comments on anything that is notable.
As a side note: when there are grammar-usage bugs, some of them can be really hard to debug, they fail far away from their source with an unrelated error. So I might work on adding more checks, logging, and descriptive error messages before release. Examples:
dont_match:
to one_scope_resolution
causes a "can't use numbered backref error" error in the linter that crashes the program. Commenting the check out then causes another error with placeholders not being replaced, no idea why haha.better-c-syntax
, there's a weird infinite recursion bug, related to ExportedGrammar's and their non-exported patterns I think. It happens when:./patterns/comments.rb
grammar.exports = [
:comment, # NOTE: no "s" AKA not :comments
:inline_comment,
]
generate.rb
grammar.import("./patterns/comments.rb")
# NOTE: there is an "s"
grammar[:comments] # returns a placeholder even though it shouldn't
grammar[:comments] = grammar[:comments]
should the grammar complain if an imported grammar's exports end up not being used?
as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?
I'll see if I can figure out what's going on with dont_match.
I'll see if I can figure out what's going on with dont_match.
I did make partial progress on this. /(?<blah_group>)()\g<1>/
throws a "numbered backref/call is not allowed. (use name)" and the dont_match uses named capture groups. So in the start_match_empty.rb
linter when it goes from a string to regex Regexp.new(pattern.start_pattern.evaluate.gsub("\\G", '\uFFFF'))
it throws the error
Idk if Onigurua also will throw an error with numbered backref's and named capture groups, but it probably will since its Ruby based. Idk why that wouldn't be allowed in Ruby either.
The most recent push to
Misc/update/vscode-textmate
restores the ability to runnpm run perf
, I decided against usingonigasm
as their embedded version ofonigurua
is6.8.2
whilevscode-onigurua
's embedded version is6.9.5
. Instead I utilized the ace that thatOnigScanner
with a single pattern acts very similarly toOnigRegExp
.
Thanks 👍 I'll probably test it out in the next few days to see exactly what you mean with the versions
as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?
Yeah, we can make a placeholder set {} that is empty at the top level. Then when a placeholder is being resolved it adds the PlaceholderPattern object to the set, and if its already on the set, then its an infinite recursion.
slightly simpler might be to borrow from dfs mapping and mark each placeholder node as ("not resolved", "resolved", or "resolving") and attempting to resolve a "resolving" placeholder is a loop. Otherwise I belive you would have to clear the set between resolves to prevent false positives on already resolved patterns.
as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?
There might also be a way to do it before any loop/recursion by changing the resolve strategy. Every pattern in the repository, like grammar[:thing]
has to point to a real pattern by the end. If grammar.repository[:thing_being_resolved]
isn't a real pattern by the time the grammar is being compiled, an error could be thrown then.
Something like
# Grammar#compile_to_json
for k,v in grammar.repository
raise if v.isPlaceholder
end
So looking at the new resolve_placeholder transformation, I am not sure how it is recursive.
if repository[arguments[:placeholder]]
is itself a placeholder, wouldn't it be a placeholder by the end?
Imagine the following grammar:
g[:a] = Patterb.new(/a/).then(g[:b])
g[:b] = Pattern.new(/b/).then(g[:c])
g[:c] = Pattern.new(/c/)
Prior to calling save the grammar has the following
{
a: Pattern.new(/a/).then(PlaceholderPattern(:b))
b: Pattern.new(/b/).then(PlaceholderPattern(:c))
c: Pattern.new(/c/)
}
Save_to is then called resolving in the resolve_placeholder plugin running.
It encounters :a
and rewrites :a
to Pattern.new(/a/).then(Pattern.new(/b/).then(PlaceholderPattern(:c)))
When it encounters :b
the placeholder for :c
is resolved but that isn't propagated back up to :a
Edit: does #map!
end up traveling through the new :match
and resolve them?
if repository[arguments[:placeholder]] is itself a placeholder, wouldn't it be a placeholder by the end?
I'm not 100% sure
repository[arguments[:placeholder]] = PlaceholderPattern.new(*args)
unless repository[arguments[:placeholder]].is_a? PatternBase
raise ":#{arguments[:placeholder]} is not a pattern and cannot be substituted"
end
#
# infinite __deep_clone__ recursion I think
#
each_pattern_like.arguments[:match] = repository[arguments[:placeholder]].__deep_clone__
Edit: does #map! end up traveling through the new :match and resolve them?
This I'm not sure of either, which is partly why I worked on creating .recursive_pattern_chain
so that a list could be generated (and basically frozen) first before starting to resolve/transform things
I don't think it can be frozen, several transformations alter the chain fairly significantly.
I don't think it can be frozen, several transformations alter the chain fairly significantly.
Sorry I meant just top-level. Basically no elements are added/removed/re-ordered on the list, but each can be mutated. Is that what you meant or are you saying things do get added dynamically?
I pushed the recursive bug on master on the experimental better-c-syntax repo if you want to see the stack overflow bug. The grammar only has 1 pattern in it at the moment. I'm still not sure how its being caused by repository[arguments[:placeholder]].__deep_clone__
so I imagine it might have to do more with #map!
Freezing at the top level should cover any existing plugins but ideally, bailout would be a part of the grammar and that is going to add more names to the grammar. There could be two stages, in the first stage, the pattern objects are immutable but the repository is mutable and in the second stage its the opposite.
Freezing at the top level should cover any existing plugins but ideally, bailout would be a part of the grammar and that is going to add more names to the grammar. There could be two stages, in the first stage, the pattern objects are immutable but the repository is mutable and in the second stage its the opposite.
Yeah that might work since the bailout should be independent of any placeholders or other resolutions
I pushed the recursive bug on master on the experimental better-c-syntax repo if you want to see the stack overflow bug. The grammar only has 1 pattern in it at the moment. I'm still not sure how its being caused by repository[arguments[:placeholder]].__deep_clone__ so I imagine it might have to do more with #map!
is grammar[:comments] = grammar[:comments]
a simplification of a more complex recusion issue, or is semantically meningful?
Well a few things happened. (I pushed 1 more change (actually a revert))
comment = grammar[:comments]
, followed later by grammar[:comments] = comment
grammar[:comments] = Grammar.import("./patterns/comments.rb")[:comments]
grammar[:comments] = Grammar.import("./comments.rb")[:comment]
The misspelling is important cause it tried to access a non-exported pattern
I pushed a change that elminates the issue with the recusion in deep clone. With the change this is the new error for better-c-syntax:
Traceback (most recent call last):
13: from main/generator.rb:75:in `<main>'
12: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:495:in `save_to'
11: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:329:in `generate'
10: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:260:in `run_pre_transform_stage'
9: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:260:in `each'
8: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:261:in `block in run_pre_transform_stage'
7: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:261:in `transform_values'
6: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:278:in `block (2 levels) in run_pre_transform_stage'
5: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/transforms/resolve_placeholders.rb:10:in `pre_transform'
4: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:162:in `map!'
3: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/transforms/resolve_placeholders.rb:21:in `block in pre_transform'
2: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:645:in `=='
1: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:640:in `eql?'
/Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_extensions/placeholder.rb:41:in `to_tag': Attempting to create a tag from an unresolved placeholder `:comments' (RuntimeError)
Based on that doc you sent sent earlier. Should we be using \k for backreferences instead of \g?
backreferences are \N
where N is a number. \g
is used for subexpressions.
Does the cpp code have subexpressions?
I thought \N was problematic for values > 9 so \g<N>
had to be used instead
just the template angle brackets, but that's planning on being removed?
I thought \N was problematic for values > 9 so \g
had to be used instead
the docs say n is valid when (n >= 1)
Oh yeah I forgot I used it there, maybe that's where I'm seeing the \g from.
As a side note, a lot later on, we could probably use subexpressions to compress and maybe optimize the regex for single patterns based on how they reuse patterns inside themself
I thought \N was problematic for values > 9 so \g
had to be used instead the docs say n is valid when (n >= 1)
Hmm I wonder how you're supposed to match a backreference followed by a digit literal. Non capture groups?
\11
vs
\1(?:1)
That could create some edge case bugs, depending on how we replace the backreferences
$backref(1)1
Converted to
\11
Instead of
\1(?:1)
Backreferences actually become (?:\1)1
I guess just for future reference:
\N
, \k<N>
is a backreference; with N >= 1
. Will consume all digits. Cannot have leading 0's \\011
is the char point 9 (tab
), not capture group 11
https://github.com/kkos/oniguruma/blob/d484916f3ca2b6bbc81accba63054625dfe26b6b/doc/RE#L411
\N
is octal; but only 1-3 digits, must only be [0-7]
and must be atleast 1 higher than the amount of capture groups available. It does not consume any literal digits after it
https://github.com/kkos/oniguruma/blob/d484916f3ca2b6bbc81accba63054625dfe26b6b/doc/RE#L24
Has there been any development in the performance loss of applying a pattern inside a capture item in the C highlighter? https://github.com/microsoft/vscode-textmate/issues/167
\k<N>
is a backreference
@RedCMD Yeah I probably should've known. I went ahead and updated the grammar builder library with that fix.
Has there been any development in the performance loss of applying a pattern inside a capture item in the C highlighter?
Sadly no
This issue is just a place for chat that doesn't necessarily relate to an issue. I'm going to close it immediately so that it doesn't add to the normal issue count, but it can still be used for general conversation.