pulsar-edit / pulsar

A Community-led Hyper-Hackable Text Editor
https://pulsar-edit.dev
Other
3.35k stars 141 forks source link

MEGA-ISSUE: Syntax highlighting in `language-c` (C/C++) #877

Open savetheclocktower opened 10 months ago

savetheclocktower commented 10 months ago

IMPORTANT: Some issues have already been fixed!

If you’re still on the regular 1.113 release, you might be suffering from a problem that has already been fixed. Many fixes landed on master in #859. You are encouraged to download the latest rolling release — both (a) to see whether what you’re reporting has been fixed, and (b) so that you can enjoy the fixes that have been reported since the 1.113 release.


This will serve as a catch-all issue for any problems you may encounter with syntax highlighting in C or C++ files (language-c). If you have problems with the new syntax highlighting (or folds, or indents) that are specific to c, cpp, or related files (h, cc, etc.), keep reading.

Something isn't highlighting correctly!

If you’ve found any highlighting regressions since 1.113:

First, please scroll down and see if someone else has reported the issue. If so, then you need only sit back and wait for a fix — most issues will be fixed in version 1.114!

If not, please comment with

I want to go back to the old highlighting!

You can easily opt out of the new Tree-sitter highlighting for any language, but first please:

To revert to the old Tree-sitter grammar for this language only, add the following to your config.cson:

".c.source, .cpp.source":
  core:
    useLegacyTreeSitter: true

To revert to the TextMate-style grammar for this language only, add the following to your config.cson:

".c.source, .cpp.source":
  core:
    useTreeSitterParsers: false
savetheclocktower commented 10 months ago

@Tamaranch has identified some changes in syntax highlighting colors in the new grammar in #872. I'll be assessing these to see which ones were inadvertent and will report back to this thread. Edit: See below.

In general, if you're using a built-in syntax theme, we intend for the new grammars to make your code look similar to how it looked before.

If you're using a community syntax theme, we can't make any such guarantees, but we'll tell you exactly how to apply the syntax highlighting you expect via overrides in your user stylesheet.

savetheclocktower commented 10 months ago

Using this file as a reference and testing with the built-in One Dark theme, I can attest that most of the syntax highlighting differences I saw were not intentional choices, and the ones that were intentional were not strongly held.

The changes have been made to both C and C++ and are described in this commit that is now part of #859.

Tamaranch commented 10 months ago

FWIW things appear more broken with the built-in Base16 Tomorrow Dark theme (for example user-defined types are no longer highlighted).

Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)? EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.

savetheclocktower commented 10 months ago

Is there an easy way to test #859 without rebuilding everything (which I'm afraid I can't do in a reasonable amount of time on my machine)?

You can set ATOM_DEV_RESOURCE_PATH as described here (just use pulsar instead of atom in your command) and launch in dev mode. But I don't expect everyone to do that, so I hope we'll be able to land #859 soon just to get a rolling release out the door.

EDIT: It seems that I can find my happiness in the artifact ubuntu-20.04 Binaries.

Aha! I always forget about those. Thank god some people understand CI better than I do.

Tamaranch commented 10 months ago

Is #859 supposed to fix the other themes too, or are you only taking care of One Dark at first?

Also, with One Dark, is highlighting variables when declaring or assigning them expected?

savetheclocktower commented 10 months ago

It's supposed to fix the other built-in themes. I admit I only fixed One Dark and One Light earlier, but I'll make a note to revisit the others.

Also, with One Dark, is highlighting variables when declaring or assigning them expected?

Yes. Let me know if that isn't the case.

Long version: In C-style languages without a sigil (like $ or @) to indicate variables, there is no obvious standard for what a variable is. If you go maximal and say that any arbitrary identifier is a variable, then everything looks the same color. If it's all going to be the same color, then syntax highlighting doesn't add any value over not adding scopes to identifiers at all.

My way of resolving this was to say that variables are scoped as variable when they're declared, assigned, or reassigned. That is how it should be with all C-style languages. There might be other contexts where I allowed a pre-existing variable scope in for the sake of continuity, but I'm trying my best to avoid these.

So — not just for @Tamaranch, but also for other readers of this issue:

Tamaranch commented 10 months ago

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Testing the latest #859 for the various built-in themes, I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?

savetheclocktower commented 10 months ago

I realize that now only One Dark doesn't highlight function parameters (in function declaration and definition). Perhaps it would be a good thing to do so here too, if we think of it as a kind of variable declaration?

I'm not sure why they made that decision. variable is colored red; variable.parameter is explicitly colored like ordinary text.

All the syntax themes are due for revisions, but that's a large project. I'm not sure that One Dark should be the default, but if I were to change it now, experience tells me I'd need to offer a concession for folks who liked the way it already was. My instinct is that we'd evolve those themes but offer the originals as optional packages.

Tamaranch commented 10 months ago

This could be a bug: structure members are not highlighted in this definition (whatever the theme):

static const struct wl_registry_listener registry_listener =
{
  .global = registry_global,
  .global_remove = registry_global_remove,
};
savetheclocktower commented 10 months ago

This could be a bug: structure members are not highlighted in this definition (whatever the theme):

Fixed in this commit. Thanks again for such great feedback!

savetheclocktower commented 10 months ago

After some agonizing and vacillating, I just made a change that affects C/C++: rather than draw a confusing distinction between built-in value types like int and bool (which TextMate would have us scope as storage.type) and library-provided types (which TextMate wants to classify as support.type), I've decided to adopt the compromise that legacy Tree-sitter seemed to make: all value types get scoped as support.storage.type instead. This allows them to live in the support namespace while also triggering storage styles in most syntax themes. User-defined types will also be scoped under support, albeit as support.other.storage.type.

storage.type is now reserved for language constructs — function, class, struct, plus variable declaration tokens like let and const in JavaScript. The idea that storage.type originally was used to mark both (e.g.) typedef and int makes it obvious that a C developer came up with the system.

Since this new convention is exactly how legacy Tree-sitter did things in C/C++, this should only reduce the highlighting differences between legacy modern Tree-sitter grammars. But let me know if you notice any unusual side-effects.

Tamaranch commented 10 months ago

Not sure if it's only since the latest changes, but pointer of pointer declarations (such as char **strv; and beyond) are not highlighted (both variables and structure members, and it seems theme-independent).

savetheclocktower commented 10 months ago

And you're on a recent build? That's odd — I added support for that a few days ago.

For reference, if you're unsure if something is theme-dependent or not, run the Editor: Log Cursor Scope command with your cursor inside the token in question. In this example…

int main (char **strv;) {

}

strv should be variable.parameter.c. If source.js is the only scope listed, that's a strong sign that something got missed.

Tamaranch commented 10 months ago

Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).

Tamaranch commented 10 months ago

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Not sure it should count, but I note in passing that array item assignments are not highlighted:

strv[n] = "str";
savetheclocktower commented 10 months ago

Yes, I'm using the appimage from the latest build (Pulsar-1.113.2024012204.AppImage). As a function parameter, highlighting works, but not as a variable or structure member. After checking, this isn't new, it's also the case with 1.113 and at least with the penultimate build I have locally (Pulsar-1.113.2024012105.AppImage).

OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?

savetheclocktower commented 10 months ago

If you find a place where a variable is not scoped as variable when being declared or assigned, then that is a bug.

Not sure it should count, but I note in passing that array item assignments are not highlighted:

strv[n] = "str";

Yeah, this one isn't a regression, but you can certainly make the argument. Not a major priority right now, but it's worth remembering.

Tamaranch commented 10 months ago

OK, that goes on the TODO list. Can you give me a code example for a scenario where highlighting isn't present?

struct {
  char **strv;
};

void f (void) {
  char **strv = NULL;
  char **strv1;
}
savetheclocktower commented 10 months ago

Pointer-of-pointer variable declarations are addressed in this commit.

SirTomofAto commented 9 months ago

It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros. This includes functions and variables as well as actual macros. It doesn't seem to affect the case when a function is being called, only when it is being declared or defined. See the screenshot below with a minimal example of the behavior for different combinations of types.

Previously, I believe it only ever highlighted items as macros if the entire word was upper case. Interestingly, both systems don't identify an all-caps macro definition as a macro if it doesn't have a specific value after the name. Also, ideally it would be nice if macros were determined by checking if there was a #define before the name, which would clearly distinguish them from variable definitions and function definitions, and then if it used that to determine which items are macros further down in the file too (not just checking for all caps), but I'm sure that's not in the scope of this issue, since even the old system didn't do that.

New behavior (tested on 1.113.0 and rolling release 1.113.2024020202) : image

Legacy tree-sitter: image

savetheclocktower commented 9 months ago

@SirTomofAto Yep, that is some weird behavior there. I'll look into it. Thanks for the report!

savetheclocktower commented 9 months ago

@SirTomofAto Addressed in this commit in #906. The ALL_CAPS heuristic is here to stay when used outside of a preprocessor block, but we can certainly scope the foo in #define foo as a constant. Read the commit message for more details.

Screenshot 2024-02-03 at 11 25 51 PM

Thanks again for reporting this!

SirTomofAto commented 9 months ago

Great! Thanks for the quick response. And yeah, I completely get the need for ALL_CAPS to be the heuristic here.

Tamaranch commented 9 months ago

Addressed in this commit in https://github.com/pulsar-edit/pulsar/pull/906.

It seems that this commit also fixes a bug I was about to report, concerning the highlighting of function parameters whose name ends in _[0-9]*, e.g.

void f (int p_, int p_1, int p_12);

It's reproducible with rolling release Pulsar-1.113.2024020202.AppImage, but not with Pulsar-1.113.2024020407.AppImage from #906. As the link isn't obvious to me from the commit message, I'd prefer to mention it, in case it's an unwanted side-effect (even though it fixes this particular bug).

savetheclocktower commented 9 months ago

Ha! It is an unwanted side effect. My pattern for detecting ALL_CAPS_CONSTANTS was missing a ^ anchor, so anything that ended with an underscore was being incorrectly flag as a potential constant.

This stopped happening for parameters because the recent changes stopped applying the constant scope to anything that had already been assigned at least one other scope. But the faulty pattern would've manifested in other scenarios, like marking foo_ here as a constant:

int table[foo_];

I'll push a fix for that, too. So glad you mentioned it!

EDIT: Here's the commit.

Tamaranch commented 9 months ago

It looks like the new system is assuming that if the last letter is upper case, then many things are macros, and if the last letter is not upper case, they are not macros.

Looks like it's still an issue in C++ (pulsar 1.114.0), see e.g. https://gitlab.xfce.org/panel-plugins/xfce4-docklike-plugin/-/blob/fcbe8325871adea32cede57212ff50977a3be156/src/Wnck.cpp#L98

savetheclocktower commented 9 months ago

Oof! I try pretty hard to keep things in sync between the two grammars. I'll take a look.

Tamaranch commented 9 months ago

Goto labels are no longer highlighted, either when "calling" them via goto or where they are located:

void f (void) {
  /* some code */
  goto cleanup;
  /* some code */
cleanup:
  /* cleanup */
}
SirTomofAto commented 8 months ago

I've encountered a strange crash that only happens when using the new tree-sitter implementation, and only occurs for me when on Linux, not Windows (I typically use Pulsar within an Ubuntu VM development environment).

int main(void) {

    // Ether of these immediately crash Pulsar 1.114 in Ubuntu, but not in Windows
    function(2 * (var) * (var2));
    function(2 * (MACRO) * (MACRO));

    // None of these crash
    function(2 * (3) * (4));
    function(2 * var * var2);
    function((2) * (var) * (var2));
    function(2 * var * (var2));
    function(2 * (var) * var2);
    function(MACRO * (var) * (var2));
    function(var * (var2) * (var3));
}

// Also crashes immediately in Ubuntu, but not in Windows
function(2 * (var) * (var2)) {}

The crash happens consistently when there is a number and then two non-numbers, with the non-numbers encapsulated in parenthesis.

savetheclocktower commented 8 months ago

I can reproduce this. The symptoms are identical to those I got when I tried to build a .wasm file from the latest tree-sitter-bash recently. I'll have to dig into this.

Thanks for the report!

savetheclocktower commented 8 months ago

OK, I think I've managed to solve this by upgrading our custom version of web-tree-sitter to one based on 0.20.9. Once I did that, your code example parsed just fine without any exceptions in the console.

This fix should go out in nine days or so with the 1.115 release, but if you want to make sure it works, you can grab a nightly build from the artifacts of this CI job (once it's done in a half-hour or so).

Thanks again, @SirTomofAto!

SirTomofAto commented 8 months ago

Thanks for the fix, it did indeed solve the crash for me! It does seem like in that scenario, the syntax highlighting is off though. It might be thinking the middle variable is a type instead? image

Speaking of which, I also noticed that when a macro definition has a c-style comment after it, the highlighting gets removed: image

savetheclocktower commented 8 months ago

Thanks for the fix, it did indeed solve the crash for me! It does seem like in that scenario, the syntax highlighting is off though. It might be thinking the middle variable is a type instead?

Yeah, looks like it. tree-sitter-tools will help to discern whether these are problems with Pulsar or with the underlying Tree-sitter parser.

Screenshot 2024-03-07 at 2 51 50 PM

Speaking of which, I also noticed that when a macro definition has a c-style comment after it, the highlighting gets removed:

Hey, that's useful. It reveals a bug in our injection handling.

Highlighting of macro values is a crapshoot for the reasons explained here. Everything after MACRO until the end of the line is, in theory, part of the macro value, and even though it may not be well-formed C, we can try to highlight it just in case.

On the second line the // comment is considered to be part of the preproc_arg node, and therefore part of the injection.

On the third line, the preproc_def has a preproc_arg child node, and then a comment node after it. So our assumption is wrong! Apparently a /* comment that appears after a macro value is parsed by tree-sitter-c, whereas a // comment after a macro value is not.

I suspect that this has something to do with the arcane historical differences between comment types in C, but I'm unwilling to spend any time in my finite lifespan on finding out. The takeaway here is that we need to be injecting into preproc_arg by name instead of just assuming it'll always be the last child of preproc_def. Thanks for the report!

SirTomofAto commented 8 months ago

Thanks for the explanation, interesting to learn about.

savetheclocktower commented 8 months ago

No problem. The fix is now present in #941, by the way.

Tamaranch commented 8 months ago

Structure members are not highlighted when there is no pointer dereference or assignment: in

if (s.m) {}

m is not highlighted, whereas in

if (s->m) {}

it is highlighted.

savetheclocktower commented 8 months ago

Structure members are not highlighted when there is no pointer dereference or assignment

C isn't my strong suit, so I'll ask: can you think of any scenarios in which s.m should be treated differently from s->m? If not, I'm happy to treat the m identically and apply the same scope name in both cases.

Tamaranch commented 8 months ago

Sounds good to me. By the way, there are a few cases that were left out above, in case it wasn't intentional:

savetheclocktower commented 8 months ago

Sounds good to me. By the way, there are a few cases that were left out above, in case it wasn't intentional

OK, gonna handle both of these. First one was just an oversight. As for the second one: the legacy tree-sitter grammar scoped goto labels — in both definition and usage — as constant.variable, which is a fun philosophical exercise that otherwise makes no damn sense.

Mirroring the definition/mention distinction that exists in functions, a label definition will be entity.name.label.c(pp), and a label reference in a goto statement will be support.other.label.c(pp). This means that they won't be the same color by default, but people who care about that can target their common label segment in their user stylesheet.

savetheclocktower commented 8 months ago

OK, just added to #914. We're running a bit late on the monthly release, but I'm hoping we can get it out this coming week.

mibi88 commented 3 months ago

In the latest rolling release (1.119.2024080301), when I dedent a line after wrapping the line to the paranthese on the previous line, the indentation is getting messed up.

ex:

printf("Some text %s"
       some_text);

I don't know if I should create a separate issue for it...

EDIT:

Here is a screenshot. Capture d’écran du 2024-08-03 13-05-20 (I hit tab two times after jumping to the line 70, the indentation was aligned with some_text on line 70)

savetheclocktower commented 3 months ago

In the latest rolling release (1.119.2024080301), when I dedent a line after wrapping the line to the paranthese on the previous line, the indentation is getting messed up.

Is this new behavior? I'm not sure I follow — can you do a screencast or otherwise explain exactly what's happening?

mibi88 commented 3 months ago

No it's not new, it was already the case in Atom, but it's annoying.

Here is a video of it:

https://github.com/user-attachments/assets/3a5929ce-d645-4a65-a4f9-eb8bc115570b

savetheclocktower commented 3 months ago

Yeah, that's a tough one. I think there's a way to describe that style of indentation with the new query system such that it always does the right thing, but in general that's a blind spot for any editor that follows from the TextMate school of thinking. (Which generally preserves indentation from one line to the next, except for certain patterns that increase or decrease indentation by a single increment.)

Feel free to file a separate issue on this so we can keep track of it; it's a feature request, but it's a fair one. I'm tempted to take this on just to prove that it's possible to detect these “hanging-indent” scenarios and do the right thing. It might not be something that lands in core, but a community package should at least be able to substitute its own indentation logic.

savetheclocktower commented 3 months ago

@mibi88, I think I've got something that would satisfy you. The only caveat is that you have to start typing before the indentation fixes itself:

https://github.com/user-attachments/assets/f413874e-08e8-422a-9703-e2f688aaf126

There isn't a fantastic way to customize the built-in indentation queries, but until I fix that, you can drop this into your init.js to give it a try:

{
  let disposable = atom.grammars.onDidAddGrammar(
    async (grammar) => {
      if (grammar.scopeName !== 'source.c' || grammar.constructor.name !== 'WASMTreeSitterGrammar') {
        return;
      }
      disposable.dispose();
      await grammar.loadQueryFiles(grammar.grammarFilePath, grammar.queryPaths);
      let existing = grammar.indentsQuery;
      grammar.indentsQuery = `
      ${existing}
      (
        (expression_statement)
        .
        (_) @match
        (#set! indent.matchIndentOf previousSibling.startPosition)
      )
      `;
    }
  )
}

This isn't comprehensive, but it certainly addresses the example you gave. Similar queries could be added (if needed) to handle if statements and the like.

If it works well, we could even add it to language-c as a configurable option.

mibi88 commented 3 months ago

Feel free to file a separate issue on this so we can keep track of it; it's a feature request, but it's a fair one. I'm tempted to take this on just to prove that it's possible to detect these “hanging-indent” scenarios and do the right thing. It might not be something that lands in core, but a community package should at least be able to substitute its own indentation logic.

Ok, I'll make one.

@mibi88, I think I've got something that would satisfy you. The only caveat is that you have to start typing before the indentation fixes itself

Yes, it's a lot better like that.