Closed jeff-hykin closed 5 years ago
I'll be working on the grammar today. @matter123 did you seen the new tree sitter extension? https://github.com/microsoft/vscode/issues/50140#issuecomment-493786052
I haven't had a chance to actually use it yet, but it looks pretty nice. I did notice that the syntax highlighting is rough in a few areas.for example parameter highlighting. Do you know if that's a tree sitter issue, a issue with the extension, or a bad theme choice?
@matter123 I'm pretty sure that's just a minor issue with the extension---it's only 3 days old! We need a C++ expert to come tweak the function that traverses the tree-sitter tree and applies colors:
@matter123 So I looked into the Atom one a bit ago, because they're missing a lot of scopes. From what I understand the Tree sitter parses the whole thing and basically has its own tree-scopes, then there is an additional file that maps those scopes to the traditional TextMate names.
So for example, in Atom the lambda is correctly parsed by the tree sitter, but Atom incorrectly marks the ->
it as member-access.
I'm not sure if the C++ tree sitter implementation is currently marking the difference between things like parameter-variables and just normal variable usage.
@matter123
tommorrow I'll look at lots of example files and log all of the scope changes before publishing. I plan on taking a look at the trigraph/digraph support and creating the @std_space. Then I'll start work on the dont_backtrack?
feature for the textmate tools and look at the add-tag issue.
@jeff-hykin unless your planning on doing some more big changes today. I'm going to work on fixing the tests.
No big changes, I was just going to completely regenerate the tests though since so much has changed.
Yeah, it's taking me about 7 minutes per fixture so at 267 fixtures that comes out to 31 hours. regeneration is probably needed.
I added you to the experimental tree sitter repo @matter123 just so you'd have access if you wanted it. https://github.com/jeff-hykin/experimental-tree-sitter/invitations
@matter123 is there a way to automate the tagging releases?
npm version
creates a tag with the new version, but that is an un-annotated tag (without a list of fixes and additions)
Edit: githubs topic on release automation https://github.com/topics/release-automation
git config push.followTags true
will make the tags produced by npm end up in the releases page. From there they should be able to be promoted to an annotated release.
Not sure what happened but your tagged version was from 184 commits ago
sigh haha. How you you make the tags? I just used the "new draft" button on the tags page
Although I did it after running npm version
and git config push.followTags true
so that might have something to do with it
Pressing new draft is all I do. Was the branch you were on happen to be behind?
Oh I know what happened. You release 1.9 with the new enum pattern then reverted.
@matter123 I generated the spec files for Objective C/C++ but for some reason all of them are missing the source
tag. Any ideas why that is?
source
is always the bottom of every single token, so the spec generator/tester just assumes that it's there and the spec files don't need it. Is something breaking?
The tests just fail because of the source
scope
huh, let me take a look.
So the issue is that there are more scopes popped off in scopeEnd than are pushed on in scopeBegin lines like:
attempted to pop meta.interface-or-protocol off scope stack, top of stack is storage.type
attempted to pop storage.type off scope stack, top of stack is source
attempted to pop meta.interface-or-protocol off scope stack, top of stack is undefined
show the problem.
source
is added in by the textmate parser https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L442 this shouldn't be needed and produces duplicatesource.cpp
The reason I added a manual one is so that it exists even inside of other languages, like markdown. (I'll explain my thinking more by the end of the day)
Since your going to investigate the performance regression, I pushed to Add/perf-inspect a very WIP perf inspector. run npm run perf -- /path/to/file
@jeff-hykin What is the purpose of the hidden portion of scope resolution. And why does it rematch the visible portion of the scope resolution?
Somehow they are slightly different and to be honest I've forgotten how exactly. I tried to combine them when I fixed the dot-access operator tag
Somehow they are slightly different and to be honest I've forgotten how exactly. I believe it's because the first one includes the generated one and the generated one includes the non-generated one. I think the tags don't get applied repeatedly when there is a "oneOrMore()" so I think the re-matching colors them. There's probably a better way of doing it.
I tried to combine them when I fixed the dot-access operator, but it failed. Some patterns include the generated one and some include the non generated one.
Here's a quick summary of the current overview I'm thinking about (for reference to myself later)
c_string
){}
(other than initilizer-lists)
func( int a = 1 + 20 + 40)
) for parametersif/else/while/for/do/try/catch
{}
's that don't relate to anythingThe goal is to have a defined difference of things like:
Once the contexts are pretty strict, I think we will actually be able to fix things like the vector<vector<>>
problem and the issue of not tagging types/variables and I think the number of random bugs will decreace consiterably.
Per [basic.link]/1, the root context consists of one or more declarations.
A declaration can be (per [dcl.dcl]/1):
A namespace context should be identical to that of the root context Besides normal variable declarations, block declarations are special purpose declarations (see [dcl.dcl]/1)
(all of this ignores modules)
We are actually in the process of implementing colorization based on IntelliSense, which we hope to include in the next release of the C/C++ extension.
Do you know much about the "colorization based on IntelliSense" they mentioned here?
I'm interested if its going to be a full-on replacement of the textmate syntax, and if so, when the next release is planned for.
Basically, color the same way the tree-sitter extensions do (by using text decorations), but the scopes come from their own internal language server.
Hey just wanted to let you know I'll work on this tomorrow, get the Assembly merged in (which is awesome btw 👍), work on #235, and then see if I can help out with the C style casting. After that I'll be doing the scope cleanup.
Well I didn't actually get to fix up the context scopes much. I spent most of today on the optimizations to the groups and the changing of the backtracking/quantifier code. Your tests are a real lifesaver, there is no way I could've changed the library like that without them.
std_space
should work really well now, it took me awhile to get it working smoothly. There was an edge case that needed it to work between non-word and non-word e.g. the space between >;
. I think this is the general behavior that is desired.
I created a tool on the Token class for helping avoid keywords. Hopefully bugs like #238 never show up again.
I couldn't find a very good assembly syntax, eventually I'll probably write a basic one then we can package it up with C++ and have a decent separate one.
The standard is intentionally omits any syntax information about the format of the string literal as that can change from target to target. To replicate that, the PR just included several grammars and hopes the one the user has installed is the one they want to use.
It doesn't make sense to bundle a nice x86 grammar when one might be using arm or atmel assembly.
It doesn't make sense to bundle a nice x86 grammar when one might be using arm or atmel assembly.
Thats true, I just meant just general patterns like a generic assembly syntax. $identifier
, %identifier
, .identifier
, identifier:
, []
's ()
's. All of the ones I installed couldn't even highlight the basics.
Pretty sure my screenshot was using https://marketplace.visualstudio.com/items?itemName=basdp.language-gas-x86
But assuming it was last there is no reason against providing some generic assembly support
I'll probably work on this tomorrow night and/or the day after. I spent tonight fixing Polacode (finally got it working just now) so that I can add some more screenshots and the themes to the readme.
I've also made a shell
branch that adds an improved shell syntax. I'm not sure what I'm going to do with the branch, I don't actually want to fully support the language myself I just wanted the basics fixed for when I read/write shell code. I'll figure something out by Wednesday though.
I'll spent tomorrow/wed mostly on C++ bugfixes though.
The scopes are finally in an organized order, if this project was fixing a broken bone I'd consider it at the part where the bone is put back into the correct place. The next steps will be to fix the template definition syntax, update the operator overloads/constructor/destructor, and then make patterns for the control flow statements.
Today I redid the parameters. Now it uses a range to match the = *stuff* ,
section meaning there's no need for hacky fixes anymore when it comes to matching parameter types/variables. 🎉 I plan to do the same thing for the template definition syntax.
I didn't do anything with the shell-lang syntax yet. My plan though is to create a script that will iterate over each language and create/publish a separate extension for each one (all under the same repo). I'm not sure yet if its a bad idea or not.
Here's the updated todo list, feel free to work on any of them (or work on whatever else you want)
I think the control flow patterns and the new
statement are the only things that need to get done before merging your C style cast in.
new
statementsI think I'm going to start adding more languages to the repo. I have to write some stuff in perl and the perl syntax is just pitiful, the shell syntax needs changes, the python syntax doesn't tag ;
's, and the go syntax is very minimal and ruby has problems, and basically all of them break inside of markdown. I'll probably only maintain C++ and C, the other ones I'm probably just going to tack things on and not worry about the language specifications. Once textmate_tools is finalized and there's a systematic way of sharing patterns, I might try to convert the javascript syntax into ruby and see if it can be maintained here. The javascript syntax is really organized, it's basically up to spec so it would just be a matter of translation.
I think it might be a good idea to include std_space
at the front of most every pattern (moving forward) and intentionally not ever matching std_space
at the end of any pattern.
Some patterns need it in order to do an effective lookbehind, but in that process they gain a higher priority. If all of the patterns do it though there won't be random priority problems. And the std_space
at the end shouldn't be used because it will mess up the lookbehind of other patterns.
@matter123 do you think this rule of thumb would cause any problems?
It shouldn't be an issue. Do note, however, that that won't stop slow non-matching patterns from still being tested. https://github.com/atom/node-oniguruma/blob/master/src/onig-searcher.cc does not bail out early when the matchPos is 0.
Yeah performance is still going to be (or be more of) a problem. We might have to introduce intentional fast-failure into some patterns
Hey just FYI @matter123 , if you're cleaning branches, the only branch of mine I plan on keeping is the "original" branch. Feel free to delete all the others
@matter123 with the recent Perl issue, I wanted to let you know whenever I get back I plan to organize things. I'll stop commiting to master and do everything through merge requests. We can come up with an official branch naming scheme, and an issue template. I think the branch naming you've been following is good, probably just add a language (e.g. cpp/fix/#209
). I'll work on localizing all of the language data (syntax.json, tags.txt, tests, fixtures, etc) to the language folders themselves, and rename this repo to something more generic like "better-syntax", and then coordinate those changes with Alexr00.
I plan to make this repo more centered around the library. Hopefully the tests can be formalized and wrapped up into an NPM package. Once the libraries are published I think each language can get it's own repo.
Its probably going to be next week before I get around to file structure refactor
Here's my idea for branch management: all lowercase, for language-specific things:
cpp/fix/#issue-number
cpp/add/feature-name
cpp/misc
All the other branches can start with a "special" char that isn't a language extension so that we never have naming conflicts. @matter123 I'm not sure what character to use, I was thinking -
but that has problems with bash/shells so I'm thinking =
// changes to grammar
=tools/add/feature
=tools/fix/#issue
=tools/misc
// changes that significantly involve multiple languages
=multi/add/feature
=multi/fix/#issue
I think I'm going to generate the original syntax highlighting for each language under a different name, so that if people want to temporarily switch to the old syntax they can do it by selecting to highlight the file with a different language. This could help cover the edge cases for whenever there's a serious performance issue or serious bug. Once this is done we can delete the original
branch
=
is also not a good choice due to shell issues
git checkout -b =tools/add/feature
zsh: tools/add/feature not found
Instead of a special character, all other branches could start with a capital.
// changes to grammar
Tools/add/feature
Tools/fix/#issue
Tools/misc
// changes that significantly involve multiple languages
Multi/add/feature
Multi/fix/#issue
okay, thats a good idea lets go with that. So moving forward, I'll make sure all my new branches use that scheme and whenever we get around to it we can remove branches using the old scheme, and add the naming scheme to the contributing guide
This issue is just a place for chat that doesn't necessarily relate to an issue. I'm going to close it immediately so that it doesn't add to the normal issue count, but it can still be used for general conversation.