yowl commented 4 years ago


Bugzilla Link	47395
Version	trunk
OS	Windows NT
CC	@dwblaikie,@efriedma-quic,@tlively,@yuanfang-chen

Extended Description

I create bitcode using libLLVM for the corert compiler project (https://github.com/dotnet/corert). It uses the c# bindings over libLLVM from https://github.com/Microsoft/LLVMSharp.

I have 2 bitcodes generated from mostly the same source code. They are around 240MB in size. One compiles in 3 minutes, the other in 12 hours. I suspect the 12 hour compilation is either not optimal or doing something wrong. I use emscripten to compile and this ultimately calls

E:/GitHub/llvm-project/build/release/bin/clang++.exe -target wasm32-unknown-emscripten -DEMSCRIPTEN_major=1 -DEMSCRIPTEN_minor=39 -DEMSCRIPTEN_tiny__=19 -D_LIBCPP_ABI_VERSION=2 -Dunix -Dunix -Dunix -Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libcxx -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libcxxabi\include -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libunwind\include -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\compat -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libc -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libc\musl\arch\emscripten -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\local\include -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SSE -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\cache\wasm\include -DEMSCRIPTEN -fignore-exceptions -c -g E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt.Wasm.bc -Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SDL -c -o E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt-release.o -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -g

What I've noticed is that compared to the "fast", 3 minute compile, the "slow" compile makes around 1 million calls to https://github.com/llvm/llvm-project/blob/a6eb70c052da767aef6b041d0db20bdf3a9e06b5/llvm/lib/Bitcode/Reader/ValueList.cpp#L89 and hence the ResolveConstants variable ends up with that many entries. Resolving these constants is then what seems to take most of the time. I think the bitcode reader is identifying 1 million forward references so possible causes of the slowness that come to mind are:

Incorrect identification of forward references
Incorrect writing from libLLVM that creates forward references unnecessarily
Slow algorithm to resolve correctly identified and written forward references.

A copy of the bitcode is at http://dev.hubse.com/UnoCoreRt.Wasm.bc.msi (its not really an msi, just needed a binary extension that the web server would serve). File is actually a .7z compressed file, so needs renaming from .msi to .7z

I privately messaged @tlively in discord and I believe he has confirmed that it takes a long time for him also.

llvm/llvm-project#46095 looks to be the same area of code, but not the same problem.

I did spend a bit of time with clang++ in the debugger, but I'm not that familiar with it at all, so I couldn't make any conclusion about my 3 theories above.

yowl commented 4 years ago

Thanks! I can confirm that with this patch the compile time is back to 3 minutes.

efriedma-quic commented 4 years ago

https://reviews.llvm.org/D87063

efriedma-quic commented 4 years ago

Probably you should be using inttoptr instead of gep on a null pointer.

I have a patch I think should fix this; will post soon.

yowl commented 4 years ago

while I've got you, this array is basically just packing numbers. If there's a more efficient way to do that, over a bunch of geps, I'm all ears.

yowl commented 4 years ago

here's the first few lines of that large array

@__embedded_metadata = global [1713228 x i32] [i32 bitcast (i8 getelementptr (i8, i8 null, i32 -559030275) to i32), i32 bitcast (i8* getelementptr (i8, i8

null, i32 1493321100) to i32), i32 bitcast (i8 getelementptr (i8, i8 null, i32 621213955) to i32), i32 bitcast (i8 getelementptr (i8, i8 null, i32 -92 1973496) to i32), i32 bitcast (i8 getelementptr (i8, i8 null, i32 420509964) to i32), i32 bitcast (i8 getelementptr (i8, i8 null, i32 1360507156) to i32 ), i32 bitcast (i8 getelementptr (i8, i8 null, i32 -920712933) to i32*), i32
bitcast (i8 getelementptr (i8, i8 null, i32 1227260194) to i32), i32 bitca st (i8 getelementptr (i8, i8 null, i32 -1121234646) to i32), i32 bitcast (i8

yowl commented 4 years ago

by "normal" I've not actually measured it, but I think its around 30 seconds.

yowl commented 4 years ago

I'm using libLLVM at the point of finishing the compilation (of c# => LLVM IR) with

if DEBUG

        Module.PrintToFile(Path.ChangeExtension(_objectFilePath, ".txt"));

endif //DEBUG

The PrintToFile dumps the textual representation and it completes in a normal amount of time given that its 1.5GB.

This method goes to

    [DllImport(LibraryPath, CallingConvention = CallingConvention.Cdecl, EntryPoint = "LLVMPrintModuleToFile", ExactSpelling = true)]
    [return: NativeTypeName("LLVMBool")]
    public static extern int PrintModuleToFile([NativeTypeName("LLVMModuleRef")] LLVMOpaqueModule* M, [NativeTypeName("const char *")] sbyte* Filename, [NativeTypeName("char **")] sbyte** ErrorMessage);

i.e. LLVMPrintModuleToFile in libLLVM

efriedma-quic commented 4 years ago

There's some even bigger ones, but vim is having a hard time with the length of the lines

The POSIX fold command should be able to handle arbitrarily long lines.

How are you dumping the IR, though? If you're using llvm-dis, how long did it take?

yowl commented 4 years ago

Yes, there are quite a few e.g.

@FrozenSegmentRegionStart = global [278553 x i32] [i32 null, i32 bitcast ([18 x i32]* @EEType_String to i32), i32 null, i32 null, i32 null

There's some even bigger ones, but vim is having a hard time with the length of the lines, but there's at least one with 1.7 million combinations of bitcast(i8* getelementptr....

efriedma-quic commented 4 years ago

If it's taking hours to read a 240MB bitcode file, there must be some non-linear algorithm involved. Looking at BitcodeReaderValueList::resolveConstantForwardRefs , I can only come up with one potential source of non-linearity: calling replaceAllUsesWith() on every element of a large array could be O(N^2). (It looks like the original author realized this was a possibility, which is why it doesn't RAUW the placeholder itself. But the reasoning was applied incompletely.)

Does the testcase have an array with a lot of constant expression operands somewhere?

It's possible we could also change the way we write bitcode files to avoid forward references more aggressively; not sure how the ordering works off the top of my head.

tlively commented 4 years ago

Thanks for the report! Yes, I was able to reproduce this problem. It would be great if someone more familiar with this part of the code could take a look, though :)

llvm / llvm-project

For some bitcodes it can take 12 hours to read and compile #46739

Extended Description

if DEBUG

endif //DEBUG