llir / llvm

Library for interacting with LLVM IR in pure Go.
https://llir.github.io/document/
BSD Zero Clause License
1.19k stars 78 forks source link

update llir/llvm to support 11.0 #147

Closed dannypsnl closed 3 years ago

dannypsnl commented 4 years ago

Current Status

11.0 already released, reference: https://releases.llvm.org/download.html#11.0.0

### Changes Below from https://releases.llvm.org/11.0.0/docs/ReleaseNotes.html#id4 - [ ] The callsite attribute vector-function-abi-variant has been added to describe the mapping between scalar functions and vector functions, to enable vectorization of call sites. The information provided by the attribute is interfaced via the API provided by the VFDatabase class. When scanning through the set of vector functions associated with a scalar call, the loop vectorizer now relies on VFDatabase, instead of TargetLibraryInfo. - [x] dereferenceable attributes and metadata on pointers no longer imply anything about the alignment of the pointer in question. Previously, some optimizations would make assumptions based on the type of the pointer. This behavior was undocumented. To preserve optimizations, frontends may need to be updated to generate appropriate align attributes and metadata. - [x] The DIModule metadata is extended to contain file and line number information. This information is used to represent Fortran modules debug info at IR level. - [x] LLVM IR now supports two distinct llvm::FixedVectorType and llvm::ScalableVectorType vector types, both derived from the base class llvm::VectorType. A number of algorithms dealing with IR vector types have been updated to make sure they work for both scalable and fixed vector types. Where possible, the code has been made generic to cover both cases using the base class. Specifically, places that were using the type unsigned to count the number of lanes of a vector are now using llvm::ElementCount. In places where uint64_t was used to denote the size in bits of a IR type we have partially migrated the codebase to using llvm::TypeSize. - [x] Branching on undef/poison is undefined behavior. It is needed for correctly analyzing value ranges based on branch conditions. This is consistent with MSan’s behavior as well. - [x] memset/memcpy/memmove can take undef/poison pointer(s) if the size to fill is zero. - [x] Passing undef/poison to a standard I/O library function call (printf/fputc/…) is undefined behavior. The new noundef attribute is attached to the functions’ arguments. The full list is available at llvm::inferLibFuncAttributes.
mewmew commented 4 years ago

11.0 almost there. Just take a note, waiting for release.

That's great! Thanks for making the issue to track the 11.0 release :)

dannypsnl commented 4 years ago

Or 12.0? I have no idea what're they doing.

mewmew commented 4 years ago

Or 12.0? I have no idea what're they doing.

Did they skip a major version?

dannypsnl commented 4 years ago

Or 12.0? I have no idea what're they doing.

Did they skip a major version?

Seems like, the document became 12.0 before 11.0 release

dannypsnl commented 4 years ago

Compare ASM parser changes:

$ wget https://github.com/llvm/llvm-project/archive/llvmorg-10.0.0.tar.gz
$ wget https://github.com/llvm/llvm-project/archive/llvmorg-11.0.0.tar.gz
$ tar zxf llvmorg-10.0.0.tar.gz
$ tar zxf llvmorg-11.0.0.tar.gz
$ git diff llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser

diff --git a/llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser/LLParser.cpp b/llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser/LLParser.cpp
index 1a17f63..c9f21ee 100644
--- a/llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser/LLParser.cpp
@@ -6937,7 +7055,12 @@ int LLParser::ParseAlloc(Instruction *&Inst, PerFunctionState &PFS) {
   if (Size && !Size->getType()->isIntegerTy())
     return Error(SizeLoc, "element count must have integer type");

-  AllocaInst *AI = new AllocaInst(Ty, AddrSpace, Size, Alignment);
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Ty->isSized(&Visited))
+    return Error(TyLoc, "Cannot allocate unsized type");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getPrefTypeAlign(Ty);
+  AllocaInst *AI = new AllocaInst(Ty, AddrSpace, Size, *Alignment);
   AI->setUsedWithInAlloca(IsInAlloca);
   AI->setSwiftError(IsSwiftError);
   Inst = AI;
@@ -6987,8 +7110,12 @@ int LLParser::ParseLoad(Instruction *&Inst, PerFunctionState &PFS) {
   if (Ty != cast<PointerType>(Val->getType())->getElementType())
     return Error(ExplicitTypeLoc,
                  "explicit pointee type doesn't match operand's pointee type");
-
-  Inst = new LoadInst(Ty, Val, "", isVolatile, Alignment, Ordering, SSID);
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Ty->isSized(&Visited))
+    return Error(ExplicitTypeLoc, "loading unsized types is not allowed");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getABITypeAlign(Ty);
+  Inst = new LoadInst(Ty, Val, "", isVolatile, *Alignment, Ordering, SSID);
   return AteExtraComma ? InstExtraComma : InstNormal;
 }

@@ -7034,8 +7161,13 @@ int LLParser::ParseStore(Instruction *&Inst, PerFunctionState &PFS) {
   if (Ordering == AtomicOrdering::Acquire ||
       Ordering == AtomicOrdering::AcquireRelease)
     return Error(Loc, "atomic store cannot use Acquire ordering");
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Val->getType()->isSized(&Visited))
+    return Error(Loc, "storing unsized types is not allowed");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getABITypeAlign(Val->getType());

-  Inst = new StoreInst(Val, Ptr, isVolatile, Alignment, Ordering, SSID);
+  Inst = new StoreInst(Val, Ptr, isVolatile, *Alignment, Ordering, SSID);
   return AteExtraComma ? InstExtraComma : InstNormal;
 }

@@ -7084,8 +7216,13 @@ int LLParser::ParseCmpXchg(Instruction *&Inst, PerFunctionState &PFS) {
     return Error(NewLoc, "new value and pointer type do not match");
   if (!New->getType()->isFirstClassType())
     return Error(NewLoc, "cmpxchg operand must be a first class value");
+
+  Align Alignment(
+      PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize(
+          Cmp->getType()));
+
   AtomicCmpXchgInst *CXI = new AtomicCmpXchgInst(
-      Ptr, Cmp, New, SuccessOrdering, FailureOrdering, SSID);
+      Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID);
   CXI->setVolatile(isVolatile);
   CXI->setWeak(isWeak);
   Inst = CXI;
@@ -7169,9 +7306,11 @@ int LLParser::ParseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) {
   if (Size < 8 || (Size & (Size - 1)))
     return Error(ValLoc, "atomicrmw operand must be power-of-two byte-sized"
                          " integer");
-
+  Align Alignment(
+      PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize(
+          Val->getType()));
   AtomicRMWInst *RMWI =
-    new AtomicRMWInst(Operation, Ptr, Val, Ordering, SSID);
+      new AtomicRMWInst(Operation, Ptr, Val, Alignment, Ordering, SSID);
   RMWI->setVolatile(isVolatile);
   Inst = RMWI;
   return AteExtraComma ? InstExtraComma : InstNormal;
@@ -8479,13 +8658,133 @@ bool LLParser::ParseOptionalVTableFuncs(VTableFuncList &VTableFuncs) {
   return false;
 }

+/// ParamNo := 'param' ':' UInt64
+bool LLParser::ParseParamNo(uint64_t &ParamNo) {
+  if (ParseToken(lltok::kw_param, "expected 'param' here") ||
+      ParseToken(lltok::colon, "expected ':' here") || ParseUInt64(ParamNo))
+    return true;
+  return false;
+}
+
+/// ParamAccessOffset := 'offset' ':' '[' APSINTVAL ',' APSINTVAL ']'
+bool LLParser::ParseParamAccessOffset(ConstantRange &Range) {
+  APSInt Lower;
+  APSInt Upper;
+  auto ParseAPSInt = [&](APSInt &Val) {
+    if (Lex.getKind() != lltok::APSInt)
+      return TokError("expected integer");
+    Val = Lex.getAPSIntVal();
+    Val = Val.extOrTrunc(FunctionSummary::ParamAccess::RangeWidth);
+    Val.setIsSigned(true);
+    Lex.Lex();
+    return false;
+  };
+  if (ParseToken(lltok::kw_offset, "expected 'offset' here") ||
+      ParseToken(lltok::colon, "expected ':' here") ||
+      ParseToken(lltok::lsquare, "expected '[' here") || ParseAPSInt(Lower) ||
+      ParseToken(lltok::comma, "expected ',' here") || ParseAPSInt(Upper) ||
+      ParseToken(lltok::rsquare, "expected ']' here"))
+    return true;
+
+  ++Upper;
+  Range =
+      (Lower == Upper && !Lower.isMaxValue())
+          ? ConstantRange::getEmpty(FunctionSummary::ParamAccess::RangeWidth)
+          : ConstantRange(Lower, Upper);
+
+  return false;
+}
+
+/// ParamAccessCall
+///   := '(' 'callee' ':' GVReference ',' ParamNo ',' ParamAccessOffset ')'
+bool LLParser::ParseParamAccessCall(FunctionSummary::ParamAccess::Call &Call) {
+  if (ParseToken(lltok::lparen, "expected '(' here") ||
+      ParseToken(lltok::kw_callee, "expected 'callee' here") ||
+      ParseToken(lltok::colon, "expected ':' here"))
+    return true;
+
+  unsigned GVId;
+  ValueInfo VI;
+  if (ParseGVReference(VI, GVId))
+    return true;
+
+  Call.Callee = VI.getGUID();
+
+  if (ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamNo(Call.ParamNo) ||
+      ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamAccessOffset(Call.Offsets))
+    return true;
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
+/// ParamAccess
+///   := '(' ParamNo ',' ParamAccessOffset [',' OptionalParamAccessCalls]? ')'
+/// OptionalParamAccessCalls := '(' Call [',' Call]* ')'
+bool LLParser::ParseParamAccess(FunctionSummary::ParamAccess &Param) {
+  if (ParseToken(lltok::lparen, "expected '(' here") ||
+      ParseParamNo(Param.ParamNo) ||
+      ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamAccessOffset(Param.Use))
+    return true;
+
+  if (EatIfPresent(lltok::comma)) {
+    if (ParseToken(lltok::kw_calls, "expected 'calls' here") ||
+        ParseToken(lltok::colon, "expected ':' here") ||
+        ParseToken(lltok::lparen, "expected '(' here"))
+      return true;
+    do {
+      FunctionSummary::ParamAccess::Call Call;
+      if (ParseParamAccessCall(Call))
+        return true;
+      Param.Calls.push_back(Call);
+    } while (EatIfPresent(lltok::comma));
+
+    if (ParseToken(lltok::rparen, "expected ')' here"))
+      return true;
+  }
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
+/// OptionalParamAccesses
+///   := 'params' ':' '(' ParamAccess [',' ParamAccess]* ')'
+bool LLParser::ParseOptionalParamAccesses(
+    std::vector<FunctionSummary::ParamAccess> &Params) {
+  assert(Lex.getKind() == lltok::kw_params);
+  Lex.Lex();
+
+  if (ParseToken(lltok::colon, "expected ':' here") ||
+      ParseToken(lltok::lparen, "expected '(' here"))
+    return true;
+
+  do {
+    FunctionSummary::ParamAccess ParamAccess;
+    if (ParseParamAccess(ParamAccess))
+      return true;
+    Params.push_back(ParamAccess);
+  } while (EatIfPresent(lltok::comma));
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
 /// OptionalRefs
 ///   := 'refs' ':' '(' GVReference [',' GVReference]* ')'
 bool LLParser::ParseOptionalRefs(std::vector<ValueInfo> &Refs) {
   assert(Lex.getKind() == lltok::kw_refs);
   Lex.Lex();

-  if (ParseToken(lltok::colon, "expected ':' in refs") |
+  if (ParseToken(lltok::colon, "expected ':' in refs") ||
       ParseToken(lltok::lparen, "expected '(' in refs"))
     return true;
dannypsnl commented 4 years ago

Below from https://releases.llvm.org/11.0.0/docs/ReleaseNotes.html#id4

mewmew commented 4 years ago

Great! Thanks a lot for the diff and release notes @dannypsnl!

Also, for those looking to contribute to the project. Both @dannypsnl and I will be busy for this LLVM release, so if you'd like to contribute to llir/llvm we'd be glad to help you get up to speed with integrating the LLVM 11.0 changes.

Cheers, Robin

mewmew commented 4 years ago
dannypsnl commented 3 years ago

still think what if we take parser from LLVM, copy enough dependencies to help it work, then use FFI?

mewmew commented 3 years ago

still think what if we take parser from LLVM, copy enough dependencies to help it work, then use FFI?

Hi @dannypsnl,

If using cgo, it probably makes more sense to use the official Go binding for LLVM.

To primary motivating case for llir/llvm is to enable access to LLVM IR without the need of cgo and complex build dependencies.

There are obvious benefits and drawbacks to both approaches. The official Go bindings for LLVM will always be up-to-date, and for more complex compilers (e.g. llgo), it makes sense to use these bindings instead of llir/llvm.

The benefit of llir/llvm is both the Go idiomatic data model (e.g. the value.Value interface; see https://github.com/llir/llvm/issues/3#issuecomment-308577279 for background discussion), and the vastly simplified build dependencies. For instance, when the decomp project in the v0.2 release switched from using the official Go bindings for LLVM (which uses cgo) to using llir/llvm the project-wide build time was substantially improved (note, the build time issue has since been mitigated by libraries such as the LLVM bindings of TinyGo which rely on system-installed libraries of LLVM).

From the v0.2 release notes of the decomp project:

Prior to this release, project-wide compilation could take several hours to complete. Now, they complete in less than 1 minute -- the established hard limit for all future releases.

Hope this gives some background on the decision to not use cgo in llir/llvm, and some options of libraries to consider for more complex use cases.

Of course, llir/llvm is here to stay. It may lag behind LLVM releases, but that's fine for the most part. The main parts of the LLVM IR language remain unchanged in between releases of LLVM.

Cheers, Robin

dannypsnl commented 3 years ago

@mewmew I believe grammar parsing is done now, the rest of the parts are

  1. summary(#43)
  2. alignment(support data layout?)
  3. vtable(we also didn't support, should we?)
mewmew commented 3 years ago

@mewmew I believe grammar parsing is done now, the rest of the parts are

that's incredible. really good job @dannypsnl! thanks for working on this.

dannypsnl commented 3 years ago

Update: once #158 fixed then all done, then we can release for llvm11

mewmew commented 3 years ago

Update: once #158 fixed then all done, then we can release for llvm11

158 is now done. I'll close this issue for now, as the main aspect remaining is module summaries which is already tracked by issue #43. I have yet to come across an LLVM IR file which requires module summaries, or a concrete use cases when developing compilers. I'm sure it exists, just haven't come across it yet personally, so I'm fine with tagging the llir/llvm release for LLVM 11.0 now, and if anyone feels up to it they may work on adding the grammar support for module summary.

Thanks once more for working on getting llir/llvm updated to LLVM 11.0 @dannypsnl.

Cheers, Robin