Open Aaron1011 opened 5 years ago
Isn't the correct way to deal with this to ... not use hashmaps? Like, FxHashMap is known to have an unstable iteration order, so we should either a) replace it with something that has a stable iteration order (maybe IndexMap, or I saw on Discord that this might have something to do with 32-bit vs. 64-bit systems hashing differently...).
I think this feature as-is isn't tenable, it's just too problematic in practice. You'll probably never finish a compiler build if it does have high overhead, and I suspect it will.
Instead of randomized iteration order you could start at a randomized offset and wrap around, that should be more cache-friendly than fully randomized memory accesses.
the iteration order ended up differing between platforms, which made it impossible to make the tests pass on all platforms.
Isn't the actual issue that FxHashMap
doesn't use RandomState
, i.e. that the iteration order appears to be the same for all instances on a platform? In the hierarchy of per-platform -> per-execution -> per-instance -> per-iteration couldn't the issue also have been avoided if it were randomized per-execution or per-instance?
Per-iteration randomization may lead to unexpected results such as map.iter().eq(map.iter()) == false
even though map was not mutably borrowed.
The std::collections documentation is a bit vague on iteration order
For unordered collections like HashMap, the items will be yielded in whatever order the internal representation made most convenient.
This could be interpreted as iteration order being dependent on internal structure, and the structure of a shared reference is not expected to change.
triage: leaving nominated for discussion, to see if there is any buy-in from any compiler team members. W.r.t. priority... I'll leave this unprioritized for now. (My instinct is that it is P-low, but if that's the case, that's ammunition for just closing it.)
Some thoughts:
FxHashMap
at all? We could set it to deny in the compiler cratesunordered_iter
method where you're expected to comment why the ordering doesn't matterThen we can basically make a quest issue to resolve all the lint warnings.
(It seems like a less intrusive and more reliable way of solving the problem.)
During the meeting I said that a design meeting felt like overkill, but I'm mildly changing my mind here. =) It seems like it could be fruitful to put some thought into the best way to solve this category of bug in general.
Isn't the correct way to deal with this to ... not use hashmaps?
@Mark-Simulacrum The idea here is to detect incorrect dependencies on HashMap iteration order. Unless you're suggesting removing all HashMaps from the compiler, I'm not really sure what you mean,
Isn't the actual issue that FxHashMap doesn't use RandomState, i.e. that the iteration order appears to be the same for all instances on a platform?
@the8472: Yes, I believe that the issue would have been expoesd earlier if FxHashMAp
used RandomState
.
Per-iteration randomization may lead to unexpected results such as map.iter().eq(map.iter()) == false even though map was not mutably borrowed.
@the8472: That's fine - the point of this is to catch issues, not to be actually be used by a normal rustc
build. In my view, any code that actually relies on this is suspect, and it would be good to know about it.
@Mark-Simulacrum The idea here is to detect incorrect dependencies on HashMap iteration order. Unless you're suggesting removing all HashMaps from the compiler, I'm not really sure what you mean,
Well, to be more concrete, we have ways of dealing with "iteration order is unstable". For example, Niko mentions that we could fairly easily lint against iterating over hashmaps, and provide a iter_unordered()
and friends within the compiler. That to me sounds like a much more likely way to expose these problems in an easy to deal with manner. If we just randomize iteration order, sure, you might change final output, but it is likely quite hard to track down why :)
Isn't the actual issue that FxHashMap doesn't use RandomState, i.e. that the iteration order appears to be the same for all instances on a platform?
It doesn't just appear to be the same, iteration order in FxHashMap
is fully deterministic for a fixed platform and a fixed order of insertions and deletions. Non-determinism is not the problem here. Also see https://github.com/rust-lang/rust/issues/63713#issuecomment-536319122.
The problem rather seems to be that iteration order differs between platforms? That could indeed cause trouble, but is no issue for determinism and also fine for reproducible builds (after all, compilation output will obviously differ between platforms). But indeed if the test suite expects things to be ordered the same way across all platforms, that would be one of the rare cases where such platform-dependent deterministic order could cause problems.
triage: P-low; removing nomination label. I would want this task to wait until we have a broader discussion about determinism in the compiler (hopefully at a steering meeting).
Triage: not aware of any work done here.
I just tried the following and I got a lot of errors and an ICE:
diff --git a/compiler/rustc_data_structures/src/fx.rs b/compiler/rustc_data_structures/src/fx.rs
index 80e72250470..5661159814b 100644
--- a/compiler/rustc_data_structures/src/fx.rs
+++ b/compiler/rustc_data_structures/src/fx.rs
@@ -1,11 +1,12 @@
-use std::hash::BuildHasherDefault;
-
-pub use rustc_hash::{FxHashMap, FxHashSet, FxHasher};
+use std::collections::{HashMap, HashSet};
+use std::hash::RandomState;
pub type StdEntry<'a, K, V> = std::collections::hash_map::Entry<'a, K, V>;
-pub type FxIndexMap<K, V> = indexmap::IndexMap<K, V, BuildHasherDefault<FxHasher>>;
-pub type FxIndexSet<V> = indexmap::IndexSet<V, BuildHasherDefault<FxHasher>>;
+pub type FxHashMap<K, V> = HashMap<K, V>;
+pub type FxHashSet<K> = HashSet<K>;
+pub type FxIndexMap<K, V> = indexmap::IndexMap<K, V, RandomState>;
+pub type FxIndexSet<V> = indexmap::IndexSet<V, RandomState>;
pub type IndexEntry<'a, K, V> = indexmap::map::Entry<'a, K, V>;
pub type IndexOccupiedEntry<'a, K, V> = indexmap::map::OccupiedEntry<'a, K, V>;
diff --git a/compiler/rustc_data_structures/src/sharded.rs b/compiler/rustc_data_structures/src/sharded.rs
index 4b02b183460..c0f0cf36c13 100644
--- a/compiler/rustc_data_structures/src/sharded.rs
+++ b/compiler/rustc_data_structures/src/sharded.rs
@@ -1,4 +1,4 @@
-use crate::fx::{FxHashMap, FxHasher};
+use crate::fx::FxHashMap;
#[cfg(parallel_compiler)]
use crate::sync::{is_dyn_thread_safe, CacheAligned};
use crate::sync::{Lock, LockGuard, Mode};
@@ -224,7 +224,8 @@ pub fn contains_pointer_to<T: Hash + IntoPointer>(&self, value: &T) -> bool {
#[inline]
pub fn make_hash<K: Hash + ?Sized>(val: &K) -> u64 {
- let mut state = FxHasher::default();
+ // FIXME somehow randomize this
+ let mut state = rustc_hash::FxHasher::default();
val.hash(&mut state);
state.finish()
}
diff --git a/compiler/rustc_data_structures/src/unord.rs b/compiler/rustc_data_structures/src/unord.rs
index a99e2062039..ab16fcd4743 100644
--- a/compiler/rustc_data_structures/src/unord.rs
+++ b/compiler/rustc_data_structures/src/unord.rs
@@ -2,7 +2,6 @@
//! ordering. This is a useful property for deterministic computations, such
//! as required by the query system.
-use rustc_hash::{FxHashMap, FxHashSet};
use std::{
borrow::{Borrow, BorrowMut},
collections::hash_map::Entry,
@@ -13,6 +12,7 @@
use crate::{
fingerprint::Fingerprint,
+ fx::{FxHashMap, FxHashSet},
stable_hasher::{HashStable, StableCompare, StableHasher, ToStableHashKey},
};
diff --git a/compiler/rustc_parse/src/parser/mod.rs b/compiler/rustc_parse/src/parser/mod.rs
index dea2b9e6ca7..922517ab9c3 100644
--- a/compiler/rustc_parse/src/parser/mod.rs
+++ b/compiler/rustc_parse/src/parser/mod.rs
@@ -179,7 +179,7 @@ pub struct Parser<'a> {
// This type is used a lot, e.g. it's cloned when matching many declarative macro rules with nonterminals. Make sure
// it doesn't unintentionally get bigger.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
-rustc_data_structures::static_assert_size!(Parser<'_>, 264);
+rustc_data_structures::static_assert_size!(Parser<'_>, 280);
/// Stores span information about a closure.
#[derive(Clone)]
diff --git a/compiler/rustc_pattern_analysis/src/usefulness.rs b/compiler/rustc_pattern_analysis/src/usefulness.rs
index a24da448766..d612b4c32a4 100644
--- a/compiler/rustc_pattern_analysis/src/usefulness.rs
+++ b/compiler/rustc_pattern_analysis/src/usefulness.rs
@@ -709,7 +709,7 @@
//! I (Nadrieril) prefer to put new tests in `ui/pattern/usefulness` unless there's a specific
//! reason not to, for example if they crucially depend on a particular feature like `or_patterns`.
-use rustc_hash::FxHashSet;
+use rustc_data_structures::fx::FxHashSet;
use rustc_index::bit_set::BitSet;
use smallvec::{smallvec, SmallVec};
use std::fmt;
When working on https://github.com/rust-lang/rust/pull/64906, I ran into a latent dependency on the iteration order of a
HashMap
. This resulted in the PR becoming unmergeable - since aFxHashMap
was being used, the iteration order ended up differing between platforms, which made it impossible to make the tests pass on all platforms.It would be nice to have a way of exposing these kinds of hidden ordering dependencies before they result in blocked PRs.
I propose the following:
randomize_iteration
tostd
andfxhash
. When enabled, this feature will cause the iteration order ofiter
anditer_mut
to be randomized per call.config.toml
key to enable both of these features.This will allow producing a special build of the compiler that will (hopefully) expose many hidden ordering dependencies. The overhead could be very high, but that shouldn't matter too much - the only purpose of this build would be to run the testsuite.
Once std-aware Cargo is fully implement, we could consider making this usable by user-written crates (e.g. allowing people to opt-in to recompiling std in this mode).