Closed BoxyUwU closed 2 months ago
It is probably the case that these include incorrect sort algorithm impls.
I am bringing these to the attention of @orlp and @Voultapher if they wish to inspect the regressions further.
sort_unstable_by
having sorted things a certain way, as it only sorts by frequency, not by frequency and then natural order as the tests seem to imply is required: https://github.com/lht102/coding-problems-practice/blob/5b22b46182abe6d0617d087e6b8a0fa22f2edada/leetcode/src/00451_sort_characters_by_frequency/solution.rs#L10-L27ord_correct
test that is now failing, and while I can't find the sort call, it feels likely the answer is wrong or sort may be implicitly called by quickcheck: https://github.com/plaidfinch/hopscotch/blob/54483dabb3252f00f73432984b42bd00f7471466/tests/hopscotch.rs#L136-L146I think I am wrong for some most, actually.
sleep_sort
only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rsEither way, these do not seem to be directly from the sort changes.
coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem. the tests seem to rely on sort_unstable_by having sorted things a certain way,
sort_unstable*
is unstable as the name says and the new implementation is deterministic as the previous one was, but it produces a different equal element ordering than the one before, so a test that relies on one specific order but uses sort_unstable
is only valid for one version of Rust.
however, the original problem specifically mentions that it's okay to get either "eert" or "eetr"
I think the correct way to test that would be to do stable sort or assert_matches!(val, "eert" | "eetr")
.
atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs
Looking at the implementation, it sleeps for milliseconds, I fail to see how that could ever work reliably. The OS can decide to wake the threads up at some later point in time, sleep does not guarantee a fixed wakeup time. Plus it doesn't use latches, so it's spawn + sleep before the next thread is spawned. IMO that implementation is just plain broken.
I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?
gaffer Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? https://github.com/survemobility/gaffer/commit/f238bae67739f0df4ef4a97b245e8554c8aff6f2
I suspect it's just racy. I checked the project out locally and ran cargo t
, on my initial run I got:
test source::test::queued_resets_recurring ... FAILED
failures:
---- source::test::queued_resets_recurring stdout ----
thread 'source::test::queued_resets_recurring' panicked at src/source.rs:320:9:
assertion `left == right` failed
left: [Tester(3), Tester(2), Tester(1)]
right: [Tester(2)]
failures:
source::test::queued_resets_recurring
test result: FAILED. 36 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
Which is a completely different test, and the one in the crater run passed. Second run all tests pass.
@workingjubilee I find the proxmox-api
case very strange: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255
Note that the portion of the serde
output that was changed is actually a BTreeMap
, not a HashMap
.
@orlp Thanks for catching that! I retract my comments re: "iterating over a hashmap" for that library, but it still seems very odd.
Somewhat shockingly, the Project Euler code is also operating with a hashmap: https://github.com/chubei/project_euler/blob/d100bed4c6b7c417276ccba051567c53ed869d40/src/lib.rs#L39-L48
I really expected that one to be due to sorting issues...
I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?
There is a certain sort of person who only ever writes and runs a test once.
I do not mean to accuse, but if they only ran the test a few times, it is possible the hashes happened to work out for them. Even if the hashing perfectly randomizes the keys, if you assert on the order of only 2 items, then it comes up as a coinflip either way. It's reasonable for that to pass 2-4 times in a row if you are lucky.
I downloaded workpool and reran its tests a few times. It's another racy case. I think we've correctly identified racy library tests in:
And what seem to be definitely hashmap problems:
Unless these are fixed promptly, we should ignore those in future crater runs which cover tests.
Correctly root-caused to the sort impl, but known accepted breakage:
MPCHomeControl is a float problem, so I'll bring it to https://github.com/rust-lang/rust/issues/128898
That leaves the following as still open questions as to why the test is failing to begin with:
I haven't looked at these crates carefully.
I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.
I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.
It's a little off topic and if this turns into a bigger discussion let's please find a better place for it. My 2 cents, if your tests are order dependent and or can't be run in parallel they are broken. I'd rather tell these libraries "you choose to break the tooling's expectations, so you can't expect us to care for your use-case".
Specifically with the racy libraries here I'm not sure it would even help:
I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!
. Beware this is hashbrown::HashMap
with a PartialEq
implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq
is implemented that way?
Is once again also hashmap order, the main
branch defines Names
as:
#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
#[serde(
flatten,
deserialize_with = "super::deserialize_multi_btree::<'_, NumberedNames, _>",
serialize_with = "super::serialize_multi_btree::<NumberedNames, _>"
)]
names: BTreeMap<u32, String>,
#[serde(
flatten,
deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
)]
additional_data: HashMap<String, String>,
}
Where the field additional_data
is a hashmap. But in the test it only accounts for the name_2
key and not for name1
or name2
which are the ones that swap place in the produced JSON string. However if look at the v0.1.1 tag Names
has a different definition:
#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
#[serde(
flatten,
deserialize_with = "super::deserialize_multi::<'_, NumberedNames, _>",
serialize_with = "super::serialize_multi::<NumberedNames, _>"
)]
names: HashMap<u32, String>,
#[serde(
flatten,
deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
)]
additional_data: HashMap<String, String>,
}
Now the fields names
and additional_data
are HashMap
s. This is the version that was tested by the crater run. Mystery solved 😄
I think it's our good friend hashmap order see, which constructs two hashmaps and does
assert_eq!
. Beware this ishashbrown::HashMap
with aPartialEq
implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun thatPartialEq
is implemented that way?
Equality doesn't care about which order you check the keys in, the debug output is different but that shouldn't affect the comparison result. There seem to be extra spaces in the keys on one side, which I assume means the test itself is broken. EDIT: looking at it a little more, seems to be related to not accounting for whitespace in stringify!
of ty
fragments, so tests :: Component1
gets split at the ::
and given the name Component1
instead of the expected Component1
, maybe https://github.com/rust-lang/rust/pull/125174 related.
This is not a regression. Locally the tests partial_ord_correct
and ord_correct
fail with some chance both on stable and nightly. I've seen all 4 combinations of success and fail on for both versions of Rust. I think it depends on wether quickcheck finds a code path where the properties don't hold, which I assume depends on randomness.
Digging into why it fails, as example input that fails let's take left = [(66, 1)]
and right = [(66, 0), (0, 0)]
. The tests compare partial_cmp
and cmp
results. compare_as_iters
answers what the keys compared via the the iterator::cmp
method yields. In this case [66].iter().cmp([66, 0].iter()) == Ordering::Less
. In contrast the hopscotch::Queue::cmp
method uses completely different logic and does offset and tag comparisons before going over to value, not key, iter comparison. In this specific instance tag comparison yields Ordering::Greater
which causes the mismatch and test failure.
In my opinion either the test does not test what the author wanted it to test, or the implementation does not do what the author intended it to do.
Given the lack of active maintenance and sub 10k downloads, it might make sense to deny-list some of these crates that fail because of broken tests and or implementations.
Equality doesn't care about which order you check the keys in
My bad, indeed the other.get
call makes it order independent.
Would it be possible to add a clippy
warning to some code patterns which attempt to assert
an expression that depends on hashmap iteration order? Obviously we can't catch everything, but perhaps some of the more common patterns?
I guess I retract my retraction regarding proxmox-api???
I guess we should test a revert of https://github.com/rust-lang/rust/pull/125174
Hm, #125174 was cratered.
The proxmox-api tests may work on the next crater run, so let's not exclude those.
I've opened some PRs and issues on the relevant crates in the hopes of them fixing their code so that we can continue using their test data:
It seems unlikely these will all be resolved by the next beta crater run.
In particular, the gaffer tests seem to be completely insoluble as it is an issue in essentially all the tests, so let's just skip that crate's tests from now on.
I have opened some followups:
Everything else is accepted breakage, or (hopefully) will be published and fixed for the next beta crater, so I think we're done here.
Thank you for playing, everyone!