Open savetheclocktower opened 10 months ago
@Tamaranch has identified some changes in syntax highlighting colors in the new grammar in #872. I'll be assessing these to see which ones were inadvertent and will report back to this thread. Edit: See below.
In general, if you're using a built-in syntax theme, we intend for the new grammars to make your code look similar to how it looked before.
If you're using a community syntax theme, we can't make any such guarantees, but we'll tell you exactly how to apply the syntax highlighting you expect via overrides in your user stylesheet.
Using this file as a reference and testing with the built-in One Dark theme, I can attest that most of the syntax highlighting differences I saw were not intentional choices, and the ones that were intentional were not strongly held.
entity.other.attribute-name.c
on declaration to try to harmonize them with JavaScript object keys (which bear a similar scope when they're declared) but variable.other.member.c
is just as arguable, and involves no disruption.punctuation
declaration for their leading #
, so that got added. In many themes that won't have an effect, but in One Dark it'll make them look less like the old Tree-sitter grammar and more like the TextMate grammar.meta.block.c
scopes in the TextMate grammar. Most meta
scopes from TextMate grammars are not very useful, but I've found meta.block
scopes useful in the JavaScript grammar, so I've added some to the C/C++ grammars as well.The changes have been made to both C and C++ and are described in this commit that is now part of #859.
FWIW things appear more broken with the built-in Base16 Tomorrow Dark theme (for example user-defined types are no longer highlighted).
Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?
EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries
.
Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?
You can set ATOM_DEV_RESOURCE_PATH
as described here (just use pulsar
instead of atom
in your command) and launch in dev mode. But I don't expect everyone to do that, so I hope we'll be able to land #859 soon just to get a rolling release out the door.
EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.
Aha! I always forget about those. Thank god some people understand CI better than I do.
Is #859 supposed to fix the other themes too, or are you only taking care of One Dark at first?
Also, with One Dark, is highlighting variables when declaring or assigning them expected?
It's supposed to fix the other built-in themes. I admit I only fixed One Dark and One Light earlier, but I'll make a note to revisit the others.
Also, with One Dark, is highlighting variables when declaring or assigning them expected?
Yes. Let me know if that isn't the case.
Long version: In C-style languages without a sigil (like $
or @
) to indicate variables, there is no obvious standard for what a variable
is. If you go maximal and say that any arbitrary identifier is a variable, then everything looks the same color. If it's all going to be the same color, then syntax highlighting doesn't add any value over not adding scopes to identifiers at all.
My way of resolving this was to say that variables are scoped as variable
when they're declared, assigned, or reassigned. That is how it should be with all C-style languages. There might be other contexts where I allowed a pre-existing variable
scope in for the sake of continuity, but I'm trying my best to avoid these.
So — not just for @Tamaranch, but also for other readers of this issue:
variable
when being declared or assigned, then that is a bug.If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Testing the latest #859 for the various built-in themes, I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?
I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?
I'm not sure why they made that decision. variable
is colored red; variable.parameter
is explicitly colored like ordinary text.
All the syntax themes are due for revisions, but that's a large project. I'm not sure that One Dark should be the default, but if I were to change it now, experience tells me I'd need to offer a concession for folks who liked the way it already was. My instinct is that we'd evolve those themes but offer the originals as optional packages.
This could be a bug: structure members are not highlighted in this definition (whatever the theme):
static const struct wl_registry_listener registry_listener =
{
.global = registry_global,
.global_remove = registry_global_remove,
};
This could be a bug: structure members are not highlighted in this definition (whatever the theme):
Fixed in this commit. Thanks again for such great feedback!
After some agonizing and vacillating, I just made a change that affects C/C++: rather than draw a confusing distinction between built-in value types like int
and bool
(which TextMate would have us scope as storage.type
) and library-provided types (which TextMate wants to classify as support.type
), I've decided to adopt the compromise that legacy Tree-sitter seemed to make: all value types get scoped as support.storage.type
instead. This allows them to live in the support
namespace while also triggering storage
styles in most syntax themes. User-defined types will also be scoped under support
, albeit as support.other.storage.type
.
storage.type
is now reserved for language constructs — function
, class
, struct
, plus variable declaration tokens like let
and const
in JavaScript. The idea that storage.type
originally was used to mark both (e.g.) typedef
and int
makes it obvious that a C developer came up with the system.
Since this new convention is exactly how legacy Tree-sitter did things in C/C++, this should only reduce the highlighting differences between legacy modern Tree-sitter grammars. But let me know if you notice any unusual side-effects.
Not sure if it's only since the latest changes, but pointer of pointer declarations (such as char **strv;
and beyond) are not highlighted (both variables and structure members, and it seems theme-independent).
And you're on a recent build? That's odd — I added support for that a few days ago.
For reference, if you're unsure if something is theme-dependent or not, run the Editor: Log Cursor Scope command with your cursor inside the token in question. In this example…
int main (char **strv;) {
}
…strv
should be variable.parameter.c
. If source.js
is the only scope listed, that's a strong sign that something got missed.
Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).
If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Not sure it should count, but I note in passing that array item assignments are not highlighted:
strv[n] = "str";
Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).
OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?
If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.
Not sure it should count, but I note in passing that array item assignments are not highlighted:
strv[n] = "str";
Yeah, this one isn't a regression, but you can certainly make the argument. Not a major priority right now, but it's worth remembering.
OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?
struct {
char **strv;
};
void f (void) {
char **strv = NULL;
char **strv1;
}
Pointer-of-pointer variable declarations are addressed in this commit.
It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros. This includes functions and variables as well as actual macros. It doesn't seem to affect the case when a function is being called, only when it is being declared or defined. See the screenshot below with a minimal example of the behavior for different combinations of types.
Previously, I believe it only ever highlighted items as macros if the entire word was upper case. Interestingly, both systems don't identify an all-caps macro definition as a macro if it doesn't have a specific value after the name. Also, ideally it would be nice if macros were determined by checking if there was a #define before the name, which would clearly distinguish them from variable definitions and function definitions, and then if it used that to determine which items are macros further down in the file too (not just checking for all caps), but I'm sure that's not in the scope of this issue, since even the old system didn't do that.
New behavior (tested on 1.113.0 and rolling release 1.113.2024020202) :
Legacy tree-sitter:
@SirTomofAto Yep, that is some weird behavior there. I'll look into it. Thanks for the report!
@SirTomofAto Addressed in this commit in #906. The ALL_CAPS
heuristic is here to stay when used outside of a preprocessor block, but we can certainly scope the foo
in #define foo
as a constant. Read the commit message for more details.
Thanks again for reporting this!
Great! Thanks for the quick response. And yeah, I completely get the need for ALL_CAPS
to be the heuristic here.
Addressed in this commit in https://github.com/pulsar-edit/pulsar/pull/906.
It seems that this commit also fixes a bug I was about to report, concerning the highlighting of function parameters whose name ends in _[0-9]*
, e.g.
void f (int p_, int p_1, int p_12);
It's reproducible with rolling release Pulsar-1.113.2024020202.AppImage
, but not with Pulsar-1.113.2024020407.AppImage
from #906. As the link isn't obvious to me from the commit message, I'd prefer to mention it, in case it's an unwanted side-effect (even though it fixes this particular bug).
Ha! It is an unwanted side effect. My pattern for detecting ALL_CAPS_CONSTANTS
was missing a ^
anchor, so anything that ended with an underscore was being incorrectly flag as a potential constant.
This stopped happening for parameters because the recent changes stopped applying the constant
scope to anything that had already been assigned at least one other scope. But the faulty pattern would've manifested in other scenarios, like marking foo_
here as a constant:
int table[foo_];
I'll push a fix for that, too. So glad you mentioned it!
EDIT: Here's the commit.
It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros.
Looks like it's still an issue in C++ (pulsar 1.114.0), see e.g. https://gitlab.xfce.org/panel-plugins/xfce4-docklike-plugin/-/blob/fcbe8325871adea32cede57212ff50977a3be156/src/Wnck.cpp#L98
Oof! I try pretty hard to keep things in sync between the two grammars. I'll take a look.
Goto labels are no longer highlighted, either when "calling" them via goto
or where they are located:
void f (void) {
/* some code */
goto cleanup;
/* some code */
cleanup:
/* cleanup */
}
I've encountered a strange crash that only happens when using the new tree-sitter implementation, and only occurs for me when on Linux, not Windows (I typically use Pulsar within an Ubuntu VM development environment).
int main(void) {
// Ether of these immediately crash Pulsar 1.114 in Ubuntu, but not in Windows
function(2 * (var) * (var2));
function(2 * (MACRO) * (MACRO));
// None of these crash
function(2 * (3) * (4));
function(2 * var * var2);
function((2) * (var) * (var2));
function(2 * var * (var2));
function(2 * (var) * var2);
function(MACRO * (var) * (var2));
function(var * (var2) * (var3));
}
// Also crashes immediately in Ubuntu, but not in Windows
function(2 * (var) * (var2)) {}
The crash happens consistently when there is a number and then two non-numbers, with the non-numbers encapsulated in parenthesis.
I can reproduce this. The symptoms are identical to those I got when I tried to build a .wasm
file from the latest tree-sitter-bash
recently. I'll have to dig into this.
Thanks for the report!
OK, I think I've managed to solve this by upgrading our custom version of web-tree-sitter
to one based on 0.20.9. Once I did that, your code example parsed just fine without any exceptions in the console.
This fix should go out in nine days or so with the 1.115 release, but if you want to make sure it works, you can grab a nightly build from the artifacts of this CI job (once it's done in a half-hour or so).
Thanks again, @SirTomofAto!
Thanks for the fix, it did indeed solve the crash for me! It does seem like in that scenario, the syntax highlighting is off though. It might be thinking the middle variable is a type instead?
Speaking of which, I also noticed that when a macro definition has a c-style comment after it, the highlighting gets removed:
Thanks for the fix, it did indeed solve the crash for me! It does seem like in that scenario, the syntax highlighting is off though. It might be thinking the middle variable is a type instead?
Yeah, looks like it. tree-sitter-tools will help to discern whether these are problems with Pulsar or with the underlying Tree-sitter parser.
Speaking of which, I also noticed that when a macro definition has a c-style comment after it, the highlighting gets removed:
Hey, that's useful. It reveals a bug in our injection handling.
Highlighting of macro values is a crapshoot for the reasons explained here. Everything after MACRO
until the end of the line is, in theory, part of the macro value, and even though it may not be well-formed C, we can try to highlight it just in case.
On the second line the //
comment is considered to be part of the preproc_arg
node, and therefore part of the injection.
On the third line, the preproc_def
has a preproc_arg
child node, and then a comment
node after it. So our assumption is wrong! Apparently a /*
comment that appears after a macro value is parsed by tree-sitter-c
, whereas a //
comment after a macro value is not.
I suspect that this has something to do with the arcane historical differences between comment types in C, but I'm unwilling to spend any time in my finite lifespan on finding out. The takeaway here is that we need to be injecting into preproc_arg
by name instead of just assuming it'll always be the last child of preproc_def
. Thanks for the report!
Thanks for the explanation, interesting to learn about.
No problem. The fix is now present in #941, by the way.
Structure members are not highlighted when there is no pointer dereference or assignment: in
if (s.m) {}
m
is not highlighted, whereas in
if (s->m) {}
it is highlighted.
Structure members are not highlighted when there is no pointer dereference or assignment
C isn't my strong suit, so I'll ask: can you think of any scenarios in which s.m
should be treated differently from s->m
? If not, I'm happy to treat the m
identically and apply the same scope name in both cases.
Sounds good to me. By the way, there are a few cases that were left out above, in case it wasn't intentional:
Sounds good to me. By the way, there are a few cases that were left out above, in case it wasn't intentional
OK, gonna handle both of these. First one was just an oversight. As for the second one: the legacy tree-sitter grammar scoped goto labels — in both definition and usage — as constant.variable
, which is a fun philosophical exercise that otherwise makes no damn sense.
Mirroring the definition/mention distinction that exists in functions, a label definition will be entity.name.label.c(pp)
, and a label reference in a goto
statement will be support.other.label.c(pp)
. This means that they won't be the same color by default, but people who care about that can target their common label
segment in their user stylesheet.
OK, just added to #914. We're running a bit late on the monthly release, but I'm hoping we can get it out this coming week.
In the latest rolling release (1.119.2024080301), when I dedent a line after wrapping the line to the paranthese on the previous line, the indentation is getting messed up.
ex:
printf("Some text %s"
some_text);
I don't know if I should create a separate issue for it...
EDIT:
Here is a screenshot.
(I hit tab two times after jumping to the line 70, the indentation was aligned with some_text
on line 70)
In the latest rolling release (1.119.2024080301), when I dedent a line after wrapping the line to the paranthese on the previous line, the indentation is getting messed up.
Is this new behavior? I'm not sure I follow — can you do a screencast or otherwise explain exactly what's happening?
No it's not new, it was already the case in Atom, but it's annoying.
Here is a video of it:
https://github.com/user-attachments/assets/3a5929ce-d645-4a65-a4f9-eb8bc115570b
Yeah, that's a tough one. I think there's a way to describe that style of indentation with the new query system such that it always does the right thing, but in general that's a blind spot for any editor that follows from the TextMate school of thinking. (Which generally preserves indentation from one line to the next, except for certain patterns that increase or decrease indentation by a single increment.)
Feel free to file a separate issue on this so we can keep track of it; it's a feature request, but it's a fair one. I'm tempted to take this on just to prove that it's possible to detect these “hanging-indent” scenarios and do the right thing. It might not be something that lands in core, but a community package should at least be able to substitute its own indentation logic.
@mibi88, I think I've got something that would satisfy you. The only caveat is that you have to start typing before the indentation fixes itself:
https://github.com/user-attachments/assets/f413874e-08e8-422a-9703-e2f688aaf126
There isn't a fantastic way to customize the built-in indentation queries, but until I fix that, you can drop this into your init.js
to give it a try:
{
let disposable = atom.grammars.onDidAddGrammar(
async (grammar) => {
if (grammar.scopeName !== 'source.c' || grammar.constructor.name !== 'WASMTreeSitterGrammar') {
return;
}
disposable.dispose();
await grammar.loadQueryFiles(grammar.grammarFilePath, grammar.queryPaths);
let existing = grammar.indentsQuery;
grammar.indentsQuery = `
${existing}
(
(expression_statement)
.
(_) @match
(#set! indent.matchIndentOf previousSibling.startPosition)
)
`;
}
)
}
This isn't comprehensive, but it certainly addresses the example you gave. Similar queries could be added (if needed) to handle if
statements and the like.
If it works well, we could even add it to language-c
as a configurable option.
Feel free to file a separate issue on this so we can keep track of it; it's a feature request, but it's a fair one. I'm tempted to take this on just to prove that it's possible to detect these “hanging-indent” scenarios and do the right thing. It might not be something that lands in core, but a community package should at least be able to substitute its own indentation logic.
Ok, I'll make one.
@mibi88, I think I've got something that would satisfy you. The only caveat is that you have to start typing before the indentation fixes itself
Yes, it's a lot better like that.
IMPORTANT: Some issues have already been fixed!
If you’re still on the regular 1.113 release, you might be suffering from a problem that has already been fixed. Many fixes landed on
master
in #859. You are encouraged to download the latest rolling release — both (a) to see whether what you’re reporting has been fixed, and (b) so that you can enjoy the fixes that have been reported since the 1.113 release.This will serve as a catch-all issue for any problems you may encounter with syntax highlighting in C or C++ files (
language-c
). If you have problems with the new syntax highlighting (or folds, or indents) that are specific toc
,cpp
, or related files (h
,cc
, etc.), keep reading.Something isn't highlighting correctly!
If you’ve found any highlighting regressions since 1.113:
First, please scroll down and see if someone else has reported the issue. If so, then you need only sit back and wait for a fix — most issues will be fixed in version 1.114!
If not, please comment with
I want to go back to the old highlighting!
You can easily opt out of the new Tree-sitter highlighting for any language, but first please:
To revert to the old Tree-sitter grammar for this language only, add the following to your
config.cson
:To revert to the TextMate-style grammar for this language only, add the following to your
config.cson
: