jeff-hykin / better-cpp-syntax

💾 The source of VS Code's C++ syntax highlighting
GNU General Public License v3.0
156 stars 29 forks source link

The C syntax is different from the C++ syntax #70

Closed jeff-hykin closed 2 years ago

jeff-hykin commented 5 years ago

Right now the C++ and C syntax are being generated completely separately. I am familiar with C++ and most of my fixes have been exclusively for C++ even if they should be applicable to C.

There have already been issues posted about this difference.

This issue is a log and general discussion about how to keep the C syntax in sync with the C++ Syntax without breaking compatibility on either side.

update:

I've decided to fix all of C++ first and then fix C. If there's a major issue with the C syntax, then it will get fixed separately.

I'm going to use this post as a log to keep track of all the non-major issues with the C syntax.

  1. Percent Escapes in C sscanf(user_input, "%c %[^\n]", &arg0, arg_str); // reads char into arg0, the remainder until \n to arg_str atom/language-c#222 atom/language-c#70
  2. struct f foo is missing highlighting

    46

    
    /* Forward Declaration */
    struct A b;
    enum B c;
    union C d;

/ Function Declaration / void a(struct A b); void b(enum B c); void c(union C d); void d(A b);

/ Definition / struct A { enum B c; union C d; }; enum B { struct A d; union C d; }; union C { struct A b; enum B c; };


3. Trigraph support
#32 
`const char * a = "foo ??/" ??/??/";`
jeff-hykin commented 5 years ago

What is your opinion on temporarily using the C++ syntax as the C syntax since C++ is almost a perfect superset of C @matter123 ? We might want to consider this since changes to the theme (for C++) might break the colors in C since their tags have not been updated. (ex: keyword.control for memory)

Or maybe having it be a superset with the exception of replacing cpp_tokens with c_tokens

There is also the option of creating a shared C and C++ file that is imported into both, but that will take time to setup properly.

matter123 commented 5 years ago

I have a very experimental way to "diff" grammars, let me compare the C and C++ on the linux kernel. Give me about 10 minutes to set that up. Sharing common patterns is probably the best though.

Edit: There are far more files than I thought. The relatively small file size is promising, however.

matter123 commented 5 years ago

linux.log

Only a small fraction of the files as the diff tool crashed. However, most changes from the C++ grammar look harmless. Have a better source of C files?

Gettext: gettext.log

While its not impossiable to do there are several issues.

jeff-hykin commented 5 years ago

Sadly I don't but I might know some people that do. I'll check in with them. This is a good idea for testing.

matter123 commented 5 years ago

Yeah, it can catch some slightly hard to find bugs like anonymous struct https://github.com/jeff-hykin/cpp-textmate-grammar/pull/57#issuecomment-481502001, but it is currently pretty rough.

jeff-hykin commented 5 years ago

So long as themers don't have an issue, it might be best to fully clean up the C++ code first and then once it is, we work on back-porting everything to C.

If themers do have an issue, then I think the best solution will be to use the C++ highlighting for both until it is cleaned up. I looked up the differences in the C syntax that were not in C++ and there are only a handful of cases.

matter123 commented 5 years ago

C supports some pretty fancy designated initalizers int a[10] = { [5] =5,0} and {.a.v[0].c = {.a ={[1]='q'}} are both valid.

jeff-hykin commented 5 years ago

I guess we will have to see how the C++ syntax handles that. So long as it doesn't break the syntax it will probably still be an improvement over what it currently is highlighted with.

That is interesting though, I didn't see anything about that so I probably didn't dig deep enough.

matter123 commented 5 years ago
C C++
Screenshot from 2019-04-11 21-42-06 Screenshot from 2019-04-11 21-42-30

So currently C and C++ offer similar coloring of the provided designated initializer example. C++ does forget about the character literal, however.

tristan957 commented 5 years ago

I have a C repo located at https://git.sr.ht/~tristan957/tllt-cp that might be able to serve as a good test for diffing the C/C++ grammars. About 3000 lines of C code that uses many parts of the language.

matter123 commented 5 years ago

tllt-cp.log

So the first item I noticed if we are to share generators, the existing pattern tags that manually add, .cpp have to go.

Thanks @tristan957

tristan957 commented 5 years ago

Cool glad I could be of assistance

jeff-hykin commented 5 years ago

I just realized #107 blocks copying the C++ syntax to C. It would completely wreck any language that is inheriting from the C language.

matter123 commented 5 years ago

I am going to use this comment to list patterns that can be shared between C and C++

tokenization treats any sequence of characters starting with a number as a number

tristan957 commented 5 years ago

Are you guys noticing that true, false, and NULL in C++ are not highlighted the same as in C?

matter123 commented 5 years ago

Yeah that's a regressions

matter123 commented 5 years ago

@jeff-hykin Is there a preferred way for shared patterns to add to the current grammars repository?

jeff-hykin commented 5 years ago

Ah this is really old, I was thinking the file that imports the shared thing would add it to it's grammar.

jeff-hykin commented 5 years ago

In general though, I think most of the sharing is going to happen once C++ has it's type system figured out. Then the common stuff will be extracted and a good sharing system will be implemented.

jeff-hykin commented 2 years ago

C syntax has finally been separated/moved to https://github.com/jeff-hykin/better-c-syntax !