llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.31k stars 11.2k forks source link

Deadlock in wasm-ld when linking large archive with object #36412

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 37064
Version unspecified
OS Linux
Attachments tar file of musl.a and stdio_test.o
Reporter LLVM Bugzilla Contributor
CC @andrewrk,@sunfishcode,@sbc100

Extended Description

wasm-ld occasionally hits what I assume is a deadlocking race condition when linking a large wasm archive file with a small wasm object file. It sees to occur more frequently on slower machines. The issue vanishes when wasm-ld is run with "--no-threads".

I am using llvm 6.0.0 and invoking wasm-ld like so: wasm-ld --no-entry --allow-undefined stdio_test.o musl.a -o stdio_test.wasm

Backtraces of all running threads can be found here, however I don't have a debug build of LLVM handy so it's of limited use: https://gist.github.com/tyler/94beafc2929b14196fa92c4d334432c2

stdio_test.o and musl.a are attached.

sunfishcode commented 5 years ago

A fix for llvm/llvm-bugzilla-archive#41508 has now landed: https://github.com/llvm/llvm-project/commit/5081e41bdae2eb14a3f3eb8810263f9fea8fc7c1

Does that fix the testcase here?

andrewrk commented 5 years ago

Related? llvm/llvm-bugzilla-archive#41508 In the above bug report I have a fairly reliable repro.

sbc100 commented 5 years ago

Thats too bad. Its a shame that parallelForEach doesn't support this. Even more of a shame that it doesn't/can't detect this nested usage and report it.

llvmbot commented 5 years ago

We've started observing hangs in wasm-ld again. Now in 8.

From llvm/llvm-project#34154 #c6 it appears that nested parallelForEach can result in deadlocks. These have been observed, but seem tricky to reproduce, from wasm-ld. A quick look at lld's wasm driver shows that there are nested parallelForEach there, with similar observed behavior as in the ticket:

https://github.com/llvm/llvm-project/blob/0d9f609d824d50e963799b826f2cb2328e51b047/lld/wasm/Writer.cpp#L741 runs all OutputSection::writeTo in a parallelForEach, where the writeTo implementations,

https://github.com/llvm/llvm-project/blob/0d9f609d824d50e963799b826f2cb2328e51b047/lld/wasm/OutputSections.cpp#L113 https://github.com/llvm/llvm-project/blob/0d9f609d824d50e963799b826f2cb2328e51b047/lld/wasm/OutputSections.cpp#L178 https://github.com/llvm/llvm-project/blob/0d9f609d824d50e963799b826f2cb2328e51b047/lld/wasm/OutputSections.cpp#L234

all, themselves, use parallelForEach. Ff parallelForEach can still stall in nested contexts, this could be a cause for the observed hangs.

llvmbot commented 5 years ago

archive of C file and script that causes hang

llvmbot commented 6 years ago

Ah yes, I'm using a build of 6.0.0. I'll build against master and give it another shot.

sbc100 commented 6 years ago

Are you using an old version of llvm/lld?

sbc100 commented 6 years ago

I'm seeing: wasm-ld: error: stdio_test.o: Bad relocation global index

Any chance you could rebuild or send the source for that file?