jeff-hykin / better-cpp-syntax

💾 The source of VS Code's C++ syntax highlighting
GNU General Public License v3.0
156 stars 29 forks source link

Misc Conversation #185

Closed jeff-hykin closed 5 years ago

jeff-hykin commented 5 years ago

This issue is just a place for chat that doesn't necessarily relate to an issue. I'm going to close it immediately so that it doesn't add to the normal issue count, but it can still be used for general conversation.

jeff-hykin commented 5 years ago

I'll be working on the grammar today. @matter123 did you seen the new tree sitter extension? https://github.com/microsoft/vscode/issues/50140#issuecomment-493786052

matter123 commented 5 years ago

I haven't had a chance to actually use it yet, but it looks pretty nice. I did notice that the syntax highlighting is rough in a few areas.for example parameter highlighting. Do you know if that's a tree sitter issue, a issue with the extension, or a bad theme choice?

georgewfraser commented 5 years ago

@matter123 I'm pretty sure that's just a minor issue with the extension---it's only 3 days old! We need a C++ expert to come tweak the function that traverses the tree-sitter tree and applies colors:

https://github.com/georgewfraser/vscode-tree-sitter/blob/3b2d4cd2e5433af8620fe32cb35bab4479e15a60/src/extension.ts#L82

jeff-hykin commented 5 years ago

@matter123 So I looked into the Atom one a bit ago, because they're missing a lot of scopes. From what I understand the Tree sitter parses the whole thing and basically has its own tree-scopes, then there is an additional file that maps those scopes to the traditional TextMate names.

So for example, in Atom the lambda is correctly parsed by the tree sitter, but Atom incorrectly marks the -> it as member-access.

I'm not sure if the C++ tree sitter implementation is currently marking the difference between things like parameter-variables and just normal variable usage.

jeff-hykin commented 5 years ago

@matter123 tommorrow I'll look at lots of example files and log all of the scope changes before publishing. I plan on taking a look at the trigraph/digraph support and creating the @std_space. Then I'll start work on the dont_backtrack? feature for the textmate tools and look at the add-tag issue.

matter123 commented 5 years ago

@jeff-hykin unless your planning on doing some more big changes today. I'm going to work on fixing the tests.

jeff-hykin commented 5 years ago

No big changes, I was just going to completely regenerate the tests though since so much has changed.

matter123 commented 5 years ago

Yeah, it's taking me about 7 minutes per fixture so at 267 fixtures that comes out to 31 hours. regeneration is probably needed.

jeff-hykin commented 5 years ago

I added you to the experimental tree sitter repo @matter123 just so you'd have access if you wanted it. https://github.com/jeff-hykin/experimental-tree-sitter/invitations

jeff-hykin commented 5 years ago

@matter123 is there a way to automate the tagging releases?

matter123 commented 5 years ago

npm version creates a tag with the new version, but that is an un-annotated tag (without a list of fixes and additions)

Edit: githubs topic on release automation https://github.com/topics/release-automation

matter123 commented 5 years ago

git config push.followTags true will make the tags produced by npm end up in the releases page. From there they should be able to be promoted to an annotated release.

matter123 commented 5 years ago

Not sure what happened but your tagged version was from 184 commits ago

jeff-hykin commented 5 years ago

sigh haha. How you you make the tags? I just used the "new draft" button on the tags page

jeff-hykin commented 5 years ago

Although I did it after running npm version and git config push.followTags true so that might have something to do with it

matter123 commented 5 years ago

Pressing new draft is all I do. Was the branch you were on happen to be behind?

matter123 commented 5 years ago

Oh I know what happened. You release 1.9 with the new enum pattern then reverted.

jeff-hykin commented 5 years ago

@matter123 I generated the spec files for Objective C/C++ but for some reason all of them are missing the source tag. Any ideas why that is?

matter123 commented 5 years ago

source is always the bottom of every single token, so the spec generator/tester just assumes that it's there and the spec files don't need it. Is something breaking?

jeff-hykin commented 5 years ago

The tests just fail because of the source scope

Screen Shot 2019-05-28 at 6 50 08 PM
matter123 commented 5 years ago

huh, let me take a look.

matter123 commented 5 years ago

So the issue is that there are more scopes popped off in scopeEnd than are pushed on in scopeBegin lines like:

show the problem.

jeff-hykin commented 5 years ago

source is added in by the textmate parser https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L442 this shouldn't be needed and produces duplicate source.cpp

The reason I added a manual one is so that it exists even inside of other languages, like markdown. (I'll explain my thinking more by the end of the day)

matter123 commented 5 years ago

Since your going to investigate the performance regression, I pushed to Add/perf-inspect a very WIP perf inspector. run npm run perf -- /path/to/file Screenshot from 2019-05-29 21-49-14

matter123 commented 5 years ago

@jeff-hykin What is the purpose of the hidden portion of scope resolution. And why does it rematch the visible portion of the scope resolution?

jeff-hykin commented 5 years ago

Somehow they are slightly different and to be honest I've forgotten how exactly. I tried to combine them when I fixed the dot-access operator tag

jeff-hykin commented 5 years ago

Somehow they are slightly different and to be honest I've forgotten how exactly. I believe it's because the first one includes the generated one and the generated one includes the non-generated one. I think the tags don't get applied repeatedly when there is a "oneOrMore()" so I think the re-matching colors them. There's probably a better way of doing it.

I tried to combine them when I fixed the dot-access operator, but it failed. Some patterns include the generated one and some include the non generated one.

jeff-hykin commented 5 years ago

Here's a quick summary of the current overview I'm thinking about (for reference to myself later)

The goal is to have a defined difference of things like:

Once the contexts are pretty strict, I think we will actually be able to fix things like the vector<vector<>> problem and the issue of not tagging types/variables and I think the number of random bugs will decreace consiterably.

matter123 commented 5 years ago

Per [basic.link]/1, the root context consists of one or more declarations.

A declaration can be (per [dcl.dcl]/1):

A namespace context should be identical to that of the root context Besides normal variable declarations, block declarations are special purpose declarations (see [dcl.dcl]/1)

(all of this ignores modules)

jeff-hykin commented 5 years ago

We are actually in the process of implementing colorization based on IntelliSense, which we hope to include in the next release of the C/C++ extension.

Do you know much about the "colorization based on IntelliSense" they mentioned here?

I'm interested if its going to be a full-on replacement of the textmate syntax, and if so, when the next release is planned for.

matter123 commented 5 years ago

Basically, color the same way the tree-sitter extensions do (by using text decorations), but the scopes come from their own internal language server.

jeff-hykin commented 5 years ago

Hey just wanted to let you know I'll work on this tomorrow, get the Assembly merged in (which is awesome btw 👍), work on #235, and then see if I can help out with the C style casting. After that I'll be doing the scope cleanup.

jeff-hykin commented 5 years ago

Well I didn't actually get to fix up the context scopes much. I spent most of today on the optimizations to the groups and the changing of the backtracking/quantifier code. Your tests are a real lifesaver, there is no way I could've changed the library like that without them.

std_space should work really well now, it took me awhile to get it working smoothly. There was an edge case that needed it to work between non-word and non-word e.g. the space between >;. I think this is the general behavior that is desired.

I created a tool on the Token class for helping avoid keywords. Hopefully bugs like #238 never show up again.

I couldn't find a very good assembly syntax, eventually I'll probably write a basic one then we can package it up with C++ and have a decent separate one.

matter123 commented 5 years ago

The standard is intentionally omits any syntax information about the format of the string literal as that can change from target to target. To replicate that, the PR just included several grammars and hopes the one the user has installed is the one they want to use.

It doesn't make sense to bundle a nice x86 grammar when one might be using arm or atmel assembly.

jeff-hykin commented 5 years ago

It doesn't make sense to bundle a nice x86 grammar when one might be using arm or atmel assembly.

Thats true, I just meant just general patterns like a generic assembly syntax. $identifier, %identifier, .identifier, identifier:, []'s ()'s. All of the ones I installed couldn't even highlight the basics.

matter123 commented 5 years ago

Pretty sure my screenshot was using https://marketplace.visualstudio.com/items?itemName=basdp.language-gas-x86

matter123 commented 5 years ago

But assuming it was last there is no reason against providing some generic assembly support

jeff-hykin commented 5 years ago

I'll probably work on this tomorrow night and/or the day after. I spent tonight fixing Polacode (finally got it working just now) so that I can add some more screenshots and the themes to the readme.

I've also made a shell branch that adds an improved shell syntax. I'm not sure what I'm going to do with the branch, I don't actually want to fully support the language myself I just wanted the basics fixed for when I read/write shell code. I'll figure something out by Wednesday though.

I'll spent tomorrow/wed mostly on C++ bugfixes though.

jeff-hykin commented 5 years ago

The scopes are finally in an organized order, if this project was fixing a broken bone I'd consider it at the part where the bone is put back into the correct place. The next steps will be to fix the template definition syntax, update the operator overloads/constructor/destructor, and then make patterns for the control flow statements.

Today I redid the parameters. Now it uses a range to match the = *stuff* , section meaning there's no need for hacky fixes anymore when it comes to matching parameter types/variables. 🎉 I plan to do the same thing for the template definition syntax.

I didn't do anything with the shell-lang syntax yet. My plan though is to create a script that will iterate over each language and create/publish a separate extension for each one (all under the same repo). I'm not sure yet if its a bad idea or not.

jeff-hykin commented 5 years ago

Here's the updated todo list, feel free to work on any of them (or work on whatever else you want) I think the control flow patterns and the new statement are the only things that need to get done before merging your C style cast in.

I think I'm going to start adding more languages to the repo. I have to write some stuff in perl and the perl syntax is just pitiful, the shell syntax needs changes, the python syntax doesn't tag ;'s, and the go syntax is very minimal and ruby has problems, and basically all of them break inside of markdown. I'll probably only maintain C++ and C, the other ones I'm probably just going to tack things on and not worry about the language specifications. Once textmate_tools is finalized and there's a systematic way of sharing patterns, I might try to convert the javascript syntax into ruby and see if it can be maintained here. The javascript syntax is really organized, it's basically up to spec so it would just be a matter of translation.

jeff-hykin commented 5 years ago

I think it might be a good idea to include std_space at the front of most every pattern (moving forward) and intentionally not ever matching std_space at the end of any pattern.

Some patterns need it in order to do an effective lookbehind, but in that process they gain a higher priority. If all of the patterns do it though there won't be random priority problems. And the std_space at the end shouldn't be used because it will mess up the lookbehind of other patterns.

@matter123 do you think this rule of thumb would cause any problems?

matter123 commented 5 years ago

It shouldn't be an issue. Do note, however, that that won't stop slow non-matching patterns from still being tested. https://github.com/atom/node-oniguruma/blob/master/src/onig-searcher.cc does not bail out early when the matchPos is 0.

jeff-hykin commented 5 years ago

Yeah performance is still going to be (or be more of) a problem. We might have to introduce intentional fast-failure into some patterns

jeff-hykin commented 5 years ago

Hey just FYI @matter123 , if you're cleaning branches, the only branch of mine I plan on keeping is the "original" branch. Feel free to delete all the others

jeff-hykin commented 5 years ago

@matter123 with the recent Perl issue, I wanted to let you know whenever I get back I plan to organize things. I'll stop commiting to master and do everything through merge requests. We can come up with an official branch naming scheme, and an issue template. I think the branch naming you've been following is good, probably just add a language (e.g. cpp/fix/#209). I'll work on localizing all of the language data (syntax.json, tags.txt, tests, fixtures, etc) to the language folders themselves, and rename this repo to something more generic like "better-syntax", and then coordinate those changes with Alexr00.

I plan to make this repo more centered around the library. Hopefully the tests can be formalized and wrapped up into an NPM package. Once the libraries are published I think each language can get it's own repo.

jeff-hykin commented 5 years ago

Its probably going to be next week before I get around to file structure refactor

jeff-hykin commented 5 years ago

Here's my idea for branch management: all lowercase, for language-specific things:

cpp/fix/#issue-number
cpp/add/feature-name
cpp/misc

All the other branches can start with a "special" char that isn't a language extension so that we never have naming conflicts. @matter123 I'm not sure what character to use, I was thinking - but that has problems with bash/shells so I'm thinking =

// changes to grammar
=tools/add/feature
=tools/fix/#issue
=tools/misc
// changes that significantly involve multiple languages
=multi/add/feature
=multi/fix/#issue

I think I'm going to generate the original syntax highlighting for each language under a different name, so that if people want to temporarily switch to the old syntax they can do it by selecting to highlight the file with a different language. This could help cover the edge cases for whenever there's a serious performance issue or serious bug. Once this is done we can delete the original branch

matter123 commented 5 years ago

= is also not a good choice due to shell issues

git checkout -b =tools/add/feature 
zsh: tools/add/feature not found
matter123 commented 5 years ago

Instead of a special character, all other branches could start with a capital.

// changes to grammar
Tools/add/feature
Tools/fix/#issue
Tools/misc
// changes that significantly involve multiple languages
Multi/add/feature
Multi/fix/#issue
jeff-hykin commented 5 years ago

okay, thats a good idea lets go with that. So moving forward, I'll make sure all my new branches use that scheme and whenever we get around to it we can remove branches using the old scheme, and add the naming scheme to the contributing guide