llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.05k stars 11.98k forks source link

Attribute plugin example in trunk does not seem to recognize attribute arguments #45791

Open protz opened 4 years ago

protz commented 4 years ago
Bugzilla Link 46446
Version trunk
OS All
CC @AaronBallman,@john-brawn-arm,@urnathan,@zygoloid

Extended Description

I'm re-posting this from the discord #clang channel, since I haven't any answers there.

I'm playing with the Attribute plugin in clang/examples, and I modified one thing: replace OptArgs = 1 (one optional argument) in the constructor with NumArgs = 1 (exactly one argument required)... I ran bin/clang++ -fplugin=lib/Attribute.dylib -std=c++17 /tmp/test.hpp with the following contents for test.hpp:

[[ example("foo") ]]
void f() {
}

and I get:

jonathan@absinthe:~/Code/llvm-project/build (master) $ bin/clang++ -fplugin=lib/Attribute.dylib -std=c++17 /tmp/test.hpp 
/tmp/test.hpp:1:4: error: 'example' attribute takes one argument
[[ example("foo") ]]
   ^
1 error generated.

which confuses me quite a bit since the attribute does seem to have an argument!


First possible cause: the comment in Attr.h is wrong and NumArgs has a different behavior.

  /// The number of required arguments of this attribute.
  unsigned NumArgs : 4;

Second possible cause: the syntax I'm using for passing an argument to an attribute is incorrect, but I thought according to https://en.cppreference.com/w/cpp/language/attributes that I was correct.

Third possible cause: a genuine issue in ParsedAttr.


The reason I'm filing a bug is that I tried something different, leaving the code in Attribute.cpp as-is, and adding assert(false) right after the test if (Attr.getNumArgs() > 0) {, in the hope of at least asserting that this codepath is reached (same test.hpp). But, alas, nothing happened, which leads me to think that at the very least something is wrong with the example plugin.

Thanks,

Jonathan

AaronBallman commented 3 years ago

I've just run into this problem trying to reimplement a gcc plugin in llvm-land. It'd be nice to have this resolved, and something along the lines of Jonathan's patch was what I envisioned. I must admit being a little surprised by hasAttrbute's implementation, which looked like a series of string comparisons, rather than, say a map lookup. But you know, I'm not real familiar with table gen stuff. If every attribute had its own parser (and conditional enable logic?) no special knowledge needed in the parser itself?

Anyway, I'd like to help, if I can.

Thank you for pinging on this thread, it fell off my radar.

There's a related patch that's currently being investigated that goes in a different direction where the whole pile of tokens is available to the plugin. I'd be curious to know what you and Jonathan think of the approach being tried in https://reviews.llvm.org/D99861. The basic idea is: save all of the tokens from the argument list, and allow the plugin to replay those tokens to do whatever custom work it wants. One potential downside to the approach in that review is that attributes which accept arbitrary expression arguments are going to be awkward for the plugin author to support, and I think Jonathan's proposed approach may be an improvement over that. It might make sense to carry this discussion over into that review so that we converge on a solution more quickly.

urnathan commented 3 years ago

I've just run into this problem trying to reimplement a gcc plugin in llvm-land. It'd be nice to have this resolved, and something along the lines of Jonathan's patch was what I envisioned. I must admit being a little surprised by hasAttrbute's implementation, which looked like a series of string comparisons, rather than, say a map lookup. But you know, I'm not real familiar with table gen stuff. If every attribute had its own parser (and conditional enable logic?) no special knowledge needed in the parser itself?

Anyway, I'd like to help, if I can.

protz commented 4 years ago

A first proposal to start some discussion Here's a preliminary patch that I hope can serve as a basis for discussion. Apologies for the delay, I'm not working on this full-time and I had no prior experience with the clang codebase.

I'm intentionally posting a work-in-progress so that I can make sure that things are roughly looking good; I'd like to defer details (clang-format'ing my changes, naming conventions, tests) to later once the design is settled.

The patch essentially adds the ability for a plugin to handle the parsing of a CXX11 attribute's arguments. This means that with this patch, plugin-defined CXX11 attributes are no longer limited to constant attributes that take no arguments.

The patch intentionally gives the plugin a fair amount of freedom, to enable interesting use-cases (see below).

The patch does not (yet) tackle the ability for out-of-tree plugins to extend Attr.td. I'd be happy to work on that once the present patch makes progress.

Design

I eventually decided against defining a separate registry, and chose instead to extend the ParsedAttrInfo class. This avoids a lot of duplication in terms of attribute spelling resolution. With two separate registries, clang would have had to apply the "find a plugin for this spelling" logic twice, raising the potential for inconsistencies or bugs. On the plugin side, this also would have resulted in unpleasant code duplication, so, it seemed cleaner to use one registry.

By default, ParsedAttrInfo's (most of which do not come from plugins) are happy to let the built-in clang parsing logic take effect, enforcing tweaking knobs such as NumArgs and OptArgs. Plugins can, if they are so inclined, elect to bypass clang's parsing logic (which, as we saw earlier, ignores CXX11 attribute arguments when they're not known to clang). In that case, plugins are on their own; they are expected to handle all of the parsing, as well as pushing the resulting ParsedAttr into the ParsedAttributes. They then report whether the arguments are syntactically valid or not and the rest of the parsing code handles error reporting.

(Side-note: this means that it's a three-state return value, and I'm currently reusing the existing AttrHandling enum, but it could be improved with either better names or a separate enum.)

The plugin receives, in addition to location information, two key parameters:

The second parameter may be the most surprising part of this patch, so let me try to provide some motivation and context.

My goal is to enable plugins to perform advanced static analysis and code generation, relying on rich CXX11 attributes capturing deep semantic information relating, say, a function's parameters, or a struct's fields.

You can easily imagine someone authoring a static analyzer that recognizes:

typedef struct { size_t len; uint8_t *chars; [[ my_analyzer::invariant(strlen(chars) == len) ]]; } managed_string;

The point here being to leverage a nice CXX11 attribute that benefits from typo-checking, proper name and macro resolution, etc. rather than using some unreliable syntax embedded in comments like so many other tools are doing, or worse, a custom C++ parser.

In our case, we're auto-generating parsers and serializers from a struct type definition (see https://www.usenix.org/system/files/sec19-ramananandro_0.pdf for more info), so I'd like to author:

typedef struct { uint32_t min_version; uint32_t max_version [[ everparse::constraint (max_version >= min_version && min_version >= 2 && max_version <= 4) ]]; } my_network_format;

Of course, our actual network formats are much more complicated, but this is just for the sake of example. My point is that allowing plugins a substantial degree of freedom is likely to enable a flurry of novel and exciting use-cases based on clang.

Back to the design of the patch, receiving the Declarator allows me to do things like this (in the plugin):

Sema &S = P->getActions();
TypeSourceInfo *T = S.GetTypeForDeclarator(*D, P->getCurScope());
QualType R = T->getType();
VarDecl *VD = VarDecl::Create(S.getASTContext(), S.CurContext,
    D->getBeginLoc(), D->getIdentifierLoc(), D->getIdentifier(), R,
    T,
    SC_None);
// Doing this makes sure Sema is aware of the new scope entry, meaning this name
// will be in scope when parsing the expression. (Parsing and scope
// resolution are intertwined.)
S.PushOnScopeChains(VD, P->getCurScope());

This allows my plugin to successfully parse and recognize complex arguments that refer to the current declarator, in effect letting my plugin define its own extended syntax and scoping rules.

I hope this helps, I'll keep working on the patch (this is the first version that seems to do something useful) but thought it'd be helpful to get the discussion started in the meanwhile.

Thanks,

Jonathan

Details

AaronBallman commented 4 years ago

Yes, I have been looking at this too and would love the ability to extend the attributes via an out-of-tree plugin -- this would be perfect for my use-case. This also seems like an ambitious, long-term goal.

Yup, we're still a ways away from that goal.

In the meanwhile, here's a more modest goal that would allow me to be unblocked, and also allow other plugin authors to immediately parse more exciting attributes. What do you think of:

  • creating a new Registry that, in a similar fashion to ParsedAttrInfoRegistry, holds, say, instances of AttributeParsers, the class of custom parsers
  • plugins can register an AttributeParser based on the spelling of the attribute (this is also similar to how ParsedAttrInfo works)

Perhaps AttributeArgParser to make it clear it's only for parsing arguments?

  • AttributeParsers expose a method that, essentially, receives the same arguments as ParseCXX11AttributeArgs and potentially does meaningful things
  • the contract is that a plugin gets called after the spelling has been recognized, then must consume everything in-between, and including, the parentheses

Let me know if this seems feasible and a sensible course of action.

I think this sounds reasonable at a high level and it gets us incrementally closer to the long-term goal.

Given that most attribute arguments are simple (string literal, integer literal, identifier, or enumeration), do you think there will be a way to provide some default parsing behavior for those easy cases?

protz commented 4 years ago

Thanks for the explanation for the difference in behavior. That explains what I've been observing.

One of the long-term hopes I have is that we can eventually get to the point where you can write a custom MyFunAttr.td file to specify custom attributes similar to how you specify built-in attributes. One downside to this is that it requires running the tablegen, though.

Yes, I have been looking at this too and would love the ability to extend the attributes via an out-of-tree plugin -- this would be perfect for my use-case. This also seems like an ambitious, long-term goal.

In the meanwhile, here's a more modest goal that would allow me to be unblocked, and also allow other plugin authors to immediately parse more exciting attributes. What do you think of:

Let me know if this seems feasible and a sensible course of action.

AaronBallman commented 4 years ago

I think there is a reason for the different handling of attribute and [[ ]]:

For GNU attributes, there is a grammar for arguments that applies to all attributes: the first argument is either an expression or an identifier, and all subsequent arguments are expressions; see https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html. (Clang flaunts this rule with many of its GNU-syntax attributes, though, and uses GNU-incompatible custom parsing rules that depend on the attribute name.) C++11 attributes have no such parsing rule.

Ah, thank you for the reminder about that detail of GNU attributes. That's another good reason for the divergence in behavior.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 4 years ago

I think there is a reason for the different handling of attribute and [[ ]]:

For GNU attributes, there is a grammar for arguments that applies to all attributes: the first argument is either an expression or an identifier, and all subsequent arguments are expressions; see https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html. (Clang flaunts this rule with many of its GNU-syntax attributes, though, and uses GNU-incompatible custom parsing rules that depend on the attribute name.) C++11 attributes have no such parsing rule.

AaronBallman commented 4 years ago

Thanks for both comments this is very helpful. I now understand this issue as two separate points:

I think hasAttribute() needs to be updated regardless though, because I would expect __has_attribute(some::plugin_attr) to give the correct response, and it looks like it's not doing so.

[[ ]] and attribute(( )) exhibit different behaviors, the former ignores arguments while the latter seems to parse (by default) an expression as the argument. We could at least make the behavior uniform.

My instinct is that if we don't know about the attribute, we can't parse and retain its arguments; we can basically only eat tokens between the parens for the argument list, because we don't know what kind of parsed arguments to expect.

hasAttribute doesn't know about attributes added by plugins, so the arguments are skipped.

I ended up looking at this piece of code last night too, because I was trying to figure out how complex attributes are parsed (e.g. clang::availability which has custom syntax). That's when I realized that it would be really nice to have an extension point for plugins to register custom parsers for attributes that have non-trivial syntaxes, rather than force custom attribute arguments to be shoehorned into an expression or an unstructured string. How feasible would it be to allow plugins to register custom parsers for their custom attributes?

I think that's a good idea. One of the long-term hopes I have is that we can eventually get to the point where you can write a custom MyFunAttr.td file to specify custom attributes similar to how you specify built-in attributes. One downside to this is that it requires running the tablegen, though.

AaronBallman commented 4 years ago

I think hasAttribute() needs to be updated regardless though, because I would expect __has_attribute(some::plugin_attr) to give the correct response, and it looks like it's not doing so.

Typo: I would expect __has_cpp_attribute(some::plugin_attr) and __has_c_attribute(some::plugin_attr) to work. __has_attribute is for GNU-style attributes.

protz commented 4 years ago

Thanks for both comments this is very helpful. I now understand this issue as two separate points:

I think hasAttribute() needs to be updated regardless though, because I would expect __has_attribute(some::plugin_attr) to give the correct response, and it looks like it's not doing so.

[[ ]] and attribute(( )) exhibit different behaviors, the former ignores arguments while the latter seems to parse (by default) an expression as the argument. We could at least make the behavior uniform.

hasAttribute doesn't know about attributes added by plugins, so the arguments are skipped.

I ended up looking at this piece of code last night too, because I was trying to figure out how complex attributes are parsed (e.g. clang::availability which has custom syntax). That's when I realized that it would be really nice to have an extension point for plugins to register custom parsers for attributes that have non-trivial syntaxes, rather than force custom attribute arguments to be shoehorned into an expression or an unstructured string. How feasible would it be to allow plugins to register custom parsers for their custom attributes?

AaronBallman commented 4 years ago

Looks like this is caused by this in Parser::ParseCXX11AttributeArgs in ParseDeclCXX.cpp

// If the attribute isn't known, we will not attempt to parse any // arguments. if (!hasAttribute(LO.CPlusPlus ? AttrSyntax::CXX : AttrSyntax::C, ScopeName, AttrName, getTargetInfo(), getLangOpts())) { // Eat the left paren, then skip to the ending right paren. ConsumeParen(); SkipUntil(tok::r_paren); return false; }

hasAttribute doesn't know about attributes added by plugins, so the arguments are skipped.

I agree that looks to be the issue. One of the problems we have with unknown attributes is that they can have effectively arbitrary arguments to them that we may not know how to translate. For instance, [[gsl::suppress(type.1)]] (note how it's token soup and not a string literal). Because the plugin architecture currently only allows you to specify the number of arguments, but no way to specify the kinds of arguments, this is sort of a concern for the plugin system as well.

I think hasAttribute() needs to be updated regardless though, because I would expect __has_attribute(some::plugin_attr) to give the correct response, and it looks like it's not doing so.

john-brawn-arm commented 4 years ago

Looks like this is caused by this in Parser::ParseCXX11AttributeArgs in ParseDeclCXX.cpp

// If the attribute isn't known, we will not attempt to parse any // arguments. if (!hasAttribute(LO.CPlusPlus ? AttrSyntax::CXX : AttrSyntax::C, ScopeName, AttrName, getTargetInfo(), getLangOpts())) { // Eat the left paren, then skip to the ending right paren. ConsumeParen(); SkipUntil(tok::r_paren); return false; }

hasAttribute doesn't know about attributes added by plugins, so the arguments are skipped.

protz commented 4 years ago

Doing some more investigation, I changed:

void fn2() __attribute__((example("somestring"))) { }

in clang/test/Frontend/plugin-attribute.cpp to

[[ example("somestring") ]] void fn2() { }

which was enough to make the test fail. Note that the [[ ]] syntax with an argument is not tested anywhere else in plugin-attribute.cpp, which leads me to think that the parser simply is not recognizing the syntax I'm using.