llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.87k stars 11.93k forks source link

Nondeterminism of iterators causes false ThinLTO cache misses #45164

Closed llvmbot closed 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 45819
Resolution FIXED
Resolved on Jun 16, 2020 14:53
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@eleviant,@fhahn,@pcc,@wjristow,@yuanfang-chen
Fixed by commit(s) 252892fea7088abbeff9476e0ecbacc091d135a0

Extended Description

We noticed that when building an executable with ThinLTO (cache enabled), from time to time unexpected cache misses happening and new cache entries are generated when not needed. It doesn't happen in a predictable manner (i.e. a cache miss might or might not happen). I will provide an explanation of why it happens sporadically.

==========================

REPRODUCING THE PROBLEM:

Pretty much any medium size project will reproduce the problem.

Below is an example, where I link 9 bitcode files with ThinLTO (cache enabled). 9 cache entries are created. After I relink again (without any changes), 13 cache entries appear (i.e. ThinLTO decided to regenerate cache for 4 files).

// No cache directory yet $ ls bar.c clean.bat foo.c har.c hoo.c lar.c link-only.bat loo.o main.o mar.o moo.o bar.o compile.bat foo.o har.o hoo.o lar.o loo.c main.c mar.c moo.c run.bat

// linking 9 bitcode files with ThinLTO and cache enabled. $ D:/LLVM/Upstream_TOT_Windows/build/Release/bin/ld.lld.exe -v --thinlto-cache-dir=CACHE main.o bar.o mar.o lar.o har.o foo.o moo.o loo.o hoo.o -o main.exe LLD 11.0.0 (https://github.com/llvm/llvm-project.git 0e13a0331fb90078bf71cc0c4612492a6954a5d0) (compatible with GNU linkers)

// After linking, CACHE directory appears; it has 9 cache entries now $ ls CACHE llvmcache.timestamp llvmcache-672ECABBAC96BA45430F348DFCE8E15491B55EB6 llvmcache-03A6D70AEC6CAD0212E34C0A503FBFC9FE5215F1 llvmcache-992141653337693DA6421334203353861997FC36 llvmcache-309BAC6DAF777946CF2B27C3FD275A9068C05062 llvmcache-B514E34E44F63329B217ECDBC11D3CA910D4C61C llvmcache-4320D3E62841F1AF0ED4D2C750579F993CA6D990 llvmcache-B68A1DC4F284712DC4A93C5ABC81D7F6004A8B44 llvmcache-50BC52B7A3DB8537B5803E65DD37EAFA66ED3909 llvmcache-CFE77E094F6A065B944ECC2CB43C8E2CEF17BE3B

// re-linking the same way as before (no changes were made) $ D:/LLVM/Upstream_TOT_Windows/build/Release/bin/ld.lld.exe -v --thinlto-cache-dir=CACHE main.o bar.o mar.o lar.o har.o foo.o moo.o loo.o hoo.o -o main.exe LLD 11.0.0 (https://github.com/llvm/llvm-project.git 0e13a0331fb90078bf71cc0c4612492a6954a5d0) (compatible with GNU linkers)

// Note that CACHE directory now has 13 cache entries, though only 9 cache entries are expected. $ ls CACHE llvmcache.timestamp llvmcache-672ECABBAC96BA45430F348DFCE8E15491B55EB6 llvmcache-03A6D70AEC6CAD0212E34C0A503FBFC9FE5215F1 llvmcache-8EBFE11F8BD1F02A060B8A4AB2061C0442B64CF1 llvmcache-2EF2ADD5D8F0D727E460B6A94CA3A24C71C051CB llvmcache-992141653337693DA6421334203353861997FC36 llvmcache-309BAC6DAF777946CF2B27C3FD275A9068C05062 llvmcache-B514E34E44F63329B217ECDBC11D3CA910D4C61C llvmcache-41900E8308B2AB588B8EFDBFEE923C1F3BA154C8 llvmcache-B68A1DC4F284712DC4A93C5ABC81D7F6004A8B44 llvmcache-4320D3E62841F1AF0ED4D2C750579F993CA6D990 llvmcache-CFE77E094F6A065B944ECC2CB43C8E2CEF17BE3B llvmcache-50BC52B7A3DB8537B5803E65DD37EAFA66ED3909 llvmcache-D032EEB2A05D13816229E79B08CAFFE5526A0A22

ANALYSIS OF THE PROBLEM:

Look at the following excerpt in LTO.cpp, where we are calculating the LTO cache entry key:

void llvm::computeLTOCacheKey( SmallString<40> &Key, const Config &Conf, const ModuleSummaryIndex &Index, StringRef ModuleID, const FunctionImporter::ImportMapTy &ImportList, const FunctionImporter::ExportSetTy &ExportList, const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR, const GVSummaryMapTy &DefinedGlobals, const std::set &CfiFunctionDefs, const std::set &CfiFunctionDecls) {

Note that ‘ExportList’ is a ‘DenseSet’ of ValueInfo's: using ExportSetTy = DenseSet;

‘ExportList’ is used for the calculation of a hash value (LTO Cache Key).

Though the elements of this DenseSet are the same all the time, the order in which the iterator walks through the elements of this set might (or might not) be different the next time when we relink with ThinLTO. If the order happens to be different, we will generate a different hash value and a cache miss will happen.

I wrote some code that looped through the elements of an ExportList (DenseSet) using the iterator and printed GUID values (which is what is used when a cache entry key is calculated).

For one of the files (bar.o in the example below), the order in which ExportList elements are visited is different, so the cache entry key value is different too. As a result, we will have a cache miss here.

Module ID: bar.o Export: 16434608426314478903 13549030661878666305 5512409407375956689

Module ID: bar.o Export: 16434608426314478903 5512409407375956689 13549030661878666305

Looking at the implementation of: template <> struct DenseMapInfo (see ModuleSummaryIndex.h)

we notice that the hash value that is being used by a DenseMap is a pointer: static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }

but we cannot guarantee that the compiler will allocate memory at the same address for the object when we run the executable for the second time.

HOW WE COULD FIX IT:

(1) We could store GUIDs into a container and sort the values before we use them to generate a hash value.

Before: for (const auto &VI : ExportList) { auto GUID = VI.getGUID(); // The export list can impact the internalization, be conservative here Hasher.update(ArrayRef((uint8_t *)&GUID, sizeof(GUID))); }

After: std::vector exportsGUID; exportsGUID.reserve(ExportList.size()); for (const auto &VI : ExportList) { auto GUID = VI.getGUID(); exportsGUID.push_back(GUID); }

// Sort the export list elemets GUIDs. std::stable_sort(exportsGUID.begin(), exportsGUID.end()); for (uint64_t guid : exportsGUID) { // The export list can impact the internalization, be conservative here Hasher.update(ArrayRef((uint8_t *)&guid, sizeof(guid))); }

(2) We could use a different hash function. The current implementation: static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); } is based on an address (which can change from run to run, causing this problem). If we decide to do this change, we need to discuss with hash function to use.

(3) We could use a different container that guarantees that the order of iteration matches the order of insertion. E.g. We could use SetVector instead of DenseSet, i.e. change using ExportSetTy = DenseSet; into using ExportSetTy = SetVector; The code changes are minimal and that fix takes care of the problem. However, SetVector is using more space and not as efficient. Potentially we could use some other container, maybe std::vector or SmallVector. This might require more code changes.

SOME ADDITIONAL NOTES: (1) I only encountered this problem on Windows, where I use the lld linker more frequently. The problem might exist on Unix as well, but since it is not reproducible consistently, I may simply have not run into it.

(2) Another thought why it fails to work on Windows, but might work on Unix is because of address space randomization (i.e. MSVC compiler has /HIGHENTROPYVA enabled).

(3) We might have a similar issue with ImportList as well. I haven't looked into it yet.

llvmbot commented 4 years ago

Fixed by commit 252892fea7088abbeff9476e0ecbacc091d135a0

llvmbot commented 4 years ago

I submitted a review for the fix for this problem. https://reviews.llvm.org/D79772

llvmbot commented 4 years ago

Changing the hash function (2) is inadequate because hashing containers have no guaranteed ordering.

If there's a clear "build" phase, separate from the "use/iterate" phase - yeah, just sort it (1) (moving/copying the values into a sortable container from the hashing container) before the use/iterate.

Yes, we have a clear "build" phase, followed by a separate "use/iterate" phase.

So, the general consensus is to sort both ExportList and ImportList containers, similar to what I have proposed in (1), but use std::sort std:stable_sort.

dwblaikie commented 4 years ago

Changing the hash function (2) is inadequate because hashing containers have no guaranteed ordering.

If there's a clear "build" phase, separate from the "use/iterate" phase - yeah, just sort it (1) (moving/copying the values into a sortable container from the hashing container) before the use/iterate.

If there is no clear phasing (if you do some building, then iterate, then do some more building, etc) - then use a MapVector (3).

eleviant commented 4 years ago

I guess std::unordered_set also doesn't have deterministic iteration order, so changing type from DenseSet to std::unordered_set may not help. Sorting looks good, I'd probably use std::sort instead of std::stable_sort, because we're hashing GUID only.

llvmbot commented 4 years ago

+pcc for thoughts since the caching is used by Chromium. I mostly look at distributed ThinLTO which doesn't use the cache computation and won't be affected by this issue.

This presumably got introduced with D70128 which was committed back in Nov and changed the export list from using a std::unordered_set to a DenseSet. I missed the fact that it would affect determinism in the iteration order when reviewing the patch. cc'ed Evgeny who authored the patch.

I suppose this was done for efficiency reasons. But due to the cache effects we should either change this back to unordered_set or some other deterministically iterated set. It shouldn't have a fundamental effect on what that patch was doing. Or as Katya suggests, we could sort them in some way before computing the cache key.

It looks like the import list may also have this problem as it uses a StringMap which also doesn't have a deterministic iteration order. It's been that way for a couple years, so I'm surprised this non-determinism hasn't showed up earlier.

I have a preference towards doing some sorting when the cache key is computed, so that the rest of the thin link isn't affected by a less efficient container.