llir / llvm

Library for interacting with LLVM IR in pure Go.
https://llir.github.io/document/
BSD Zero Clause License
1.19k stars 78 forks source link

Ambiguity in grammar for parsing alignment attributes, string attributes #40

Open mewmew opened 5 years ago

mewmew commented 5 years ago

The grammar contains an ambiguity when parsing global variable alignment attributes. More specifically, an alignment attribute of a global variable may be interpreted either as a GlobalAttr or a FuncAttr, and since the list of both global attributes and function attributes may be optionally empty, this leads to a shift/reduce ambiguity in the parser.

From the ll.tm EBNF grammar:

GlobalDecl -> GlobalDecl
    : Name=GlobalIdent '=' ExternLinkage Preemptionopt Visibilityopt DLLStorageClassopt ThreadLocalopt UnnamedAddropt AddrSpaceopt ExternallyInitializedopt Immutable ContentType=Type (',' Section)? (',' Comdat)? (',' Align)? Metadata=(',' MetadataAttachment)+? FuncAttrs=(',' FuncAttribute)+?
;

FuncAttribute -> FuncAttribute
    : AttrString
    | AttrPair
    # not used in attribute groups.
    | AttrGroupID
    # used in functions.
    #| Align # NOTE: removed to resolve reduce/reduce conflict, see above.
    # used in attribute groups.
    | AlignPair
    | AlignStack
    | AlignStackPair
    | AllocSize
    | FuncAttr
;

Specifically, the end of the line is of interest (',' Align)? Metadata=(',' MetadataAttachment)+? FuncAttrs=(',' FuncAttribute)+?

Given that there are no metadata attachments, the alignment attribute (align 8) of the following LLVM IR:

@a = global i32 42, align 8

may be either reduced to a global attribute (i.e. Align before MetadataAttachment), or as a function attribute (i.e. FuncAttribute after MetadataAttachment).

The solution employed by the C++ parser is the opposite of maximum much, as it will try to reduce rather than shift when possible.

mewmew commented 4 years ago

From https://github.com/llir/llvm/issues/111#issuecomment-562429501

Grammar related to Function String Attribute

test cases failing likely related to Function String Attribute grammar
* `llvm/test/Bitcode/attributes.ll` - syntax error at [line 266](https://github.com/llir/testdata/blob/68fdb7c8ce371954493b215f547f5261572aadae/llvm/test/Bitcode/attributes.ll#L266) * `llvm/test/Transforms/Inline/inline-varargs.ll` - syntax error at [line 6](https://github.com/llir/testdata/blob/68fdb7c8ce371954493b215f547f5261572aadae/llvm/test/Transforms/Inline/inline-varargs.ll#L6)

align attribute

align used in call instruction
* `llvm/test/Analysis/ValueTracking/memory-dereferenceable.ll` - syntax error at line 153
align used in return attribute
* `llvm/test/Transforms/InstCombine/assume-redundant.ll` - syntax error at line 50 * `llvm/test/Transforms/LoopSimplify/unreachable-loop-pred.ll` - syntax error at line 25

I don't know how to update the grammar to handle ambiguities related to align used in return attributes, function attributes, etc. The same goes for string attributes, and key-value attributes. The approach taken now is to simply allow the most common cases of these, and then (unfortunately) fail when we can't resolve the ambiguous grammar. I wish the grammar of LLVM IR was LR-1, but that does not seem to be the case.

If anyone knows of a clean approach to handle this. You are warmly invited to share your thoughts. We'd very much appreciate it, seeing as this annoying issue is yet to find a clean solution.

Cheers, Robin

dannypsnl commented 2 years ago

Do we consider ANTLR? I was thinking so many issues you opened just cannot get fixed, maybe we should swap to a more stable parser generator? Anyway, it can't be more painful.

mewmew commented 2 years ago

Do we consider ANTLR? I was thinking so many issues you opened just cannot get fixed, maybe we should swap to a more stable parser generator? Anyway, it can't be more painful.

@dannypsnl, haha, yeah I know, there are some pains with using Textmapper. I do think however this is true for every parser generator.

That being said, feel free to do a Proof of concept : )

I think every parser generator has pros and cons. So if ANTLR turns out to solve more issues than it creates, it may be worth it. However, it should be noted that this is a ton of work. So, except to put in at least 2 full time weeks before reaching feature parity. If you still feel like working on it, then definitely, go for it!

Cheers, Robin