tree-sitter / tree-sitter-cpp

C++ grammar for tree-sitter
MIT License
264 stars 86 forks source link

Macro prevents correct parsing of class #85

Open theHamsta opened 4 years ago

theHamsta commented 4 years ago

An imported marco (I simplified the source file), prevented the correct parsing of the following class.

#include <GLFW/glfw3.h>

PXR_NAMESPACE_USING_DIRECTIVE

class Scene
{

};

I know that in the presence of macro a parser almost has no chance to understand C++, but I hope that this failure case may be useful for improving the parser.


translation_unit [3, 0] - [15, 0])
  preproc_include [3, 0] - [6, 0])
    path: system_lib_string [3, 9] - [3, 23])
  declaration [6, 0] - [11, 2])
    type: type_identifier [6, 0] - [6, 29])
    ERROR [8, 0] - [8, 5])
      identifier [8, 0] - [8, 5])
    declarator: init_declarator [8, 6] - [11, 1])
      declarator: identifier [8, 6] - [8, 11])
      value: initializer_list [9, 0] - [11, 1])
Shatur commented 3 years ago

Simillar behavior can be caused by the following code:

class EXPORT_API MyClass
{
    MyClass();
};

Where EXPORT_API is a macro that is used to export class functions to DLLs on Windows on Windows and expands to nothing on Linux.

ner0-m commented 3 years ago

I'll add something as well:

#include "doctest/doctest.h" 

TEST_CASE_TEMPLATE("Some Test", T, float, double)
{
} 

Generates this:

translation_unit [0, 0] - [5, 0]
  preproc_include [0, 0] - [1, 0]
    path: string_literal [0, 9] - [0, 28]
  ERROR [2, 0] - [4, 1]
    identifier [2, 0] - [2, 18]
    string_literal [2, 19] - [2, 30]
    identifier [2, 32] - [2, 33]
    ERROR [2, 35] - [2, 49]
      primitive_type [2, 35] - [2, 40]
      primitive_type [2, 42] - [2, 48]
    initializer_list [3, 0] - [4, 1]

I'm not sure if that is in any way detectable. But I wanted to report it, as it breaks my syntax highlighting in some cases.

MarcelRobitaille commented 7 months ago

I understand thst its impossible to detect macros in such a grammar since it's not hooked into the preprocessor. Would there be any way to let the end user manually define a list of strings that are always macros? My company only uses a handful, so it would be easy to define, and it would be great to have them not break the rest of the file.

aryx commented 7 months ago

@maxbrunsfeld would be nice indeed for C/C++ to provide a way to customize the parser to accept those macros. I don't really know how to do that though with the way the parsers are written.

MarcelRobitaille commented 7 months ago

Would it be possible to do this with an injection?

deeedob commented 6 months ago

Hey, is there any update? This behavior is really annoying, I work in repos where a namespace macro is common, thus breaking all the highlighting there is. It would already be nice if we could ignore those macros somehow. Is there any solution?

MarcelRobitaille commented 6 months ago

@deeedob The best solution I found was to fork the project and add support for some hard-coded macros. Hopefully something better will come like https://github.com/tree-sitter/tree-sitter-cpp/issues/85#issuecomment-1979313116

ImmanuelHaffner commented 6 months ago

Well, i ditched tree-sitter-based features for the most part and use LSP semantic tokens instead. It's probably slower, but also nicer. You can have different Highlight groups for local variables vs. global variables vs. fields and such. All i had to do to make it work is use a color scheme with semantic token support.

I still have tree-sitter around for some other plugins, like lukas-reineke/indent-blankline.nvim, to get visual highlights for an entire scope or control-flow construct. Works ok so far

MarcelRobitaille commented 6 months ago

@ImmanuelHaffner What LSP do you use? I haven't found a good one for c++

aryx commented 4 months ago

@maxbrunsfeld @amaanq do you have any idea why the error recovery mechanism of tree-sitter does not skip those macros and at least parse correctly the rest of the code? I don't mind having the macro itself not parsed, but I mind if the presence of a macro makes almost the whole file to fail to parse.

aryx commented 4 months ago

For example on this simple example:

#include <foo.h>
FOOBAR(some_ident, "Some string)

#include <bar1.h>
#include <bar2.h>

namespace Foo
{
namespace Bar
{

Foo::Foo() {
  int x = 0 ;
  int y = 1;
  return x + y;
}
}
}

tree-sitter-cpp is not able to parse anything. It does not recover from the error.

aryx commented 4 months ago

Weirdly, when I try it on the playground https://tree-sitter.github.io/tree-sitter/playground with this example, it actually recovers well from the error: image

@amaanq which version of tree-sitter-cpp is running in the playground? The latest?

aryx commented 4 months ago

hmm, after a few tests the issue for my example seems to be in ocaml-tree-sitter-semgrep, not tree-sitter itself. tree-sitter-c and tree-sitter-cpp with tree-sitter generate and tree-sitter parse seems to recover correctly from such macros. semgrep itself does not apparently.

ImmanuelHaffner commented 3 months ago

@MarcelRobitaille simply clangd, with clangd_extensions

zadirion commented 3 months ago

this is quite the pain for c++ when you have export macros in front of class declarations:

class EXPORT_API SomeClassName
{
};

For me this ends up being interpreted as a function instead of a class by treesitter. This of course messes with treesitter based text objects. Would be good as a user to be able to manually specify what code a macro can expand to, including expanding to nothing I'd be perfectly fine with having a .treesitter file somewhere in the directory structure of my parsed file, similar to .clang-format files