regression: assertion `left == right` failed but it depends on library-unstable details

BoxyUwU commented 2 months ago

[INFO] [stdout] thread 'tests::test_primie_factorization' panicked at src/lib.rs:132:9:
[INFO] [stdout] assertion `left == right` failed
[INFO] [stdout]   left: [PrimeFactor { prime: 2, power: 2 }, PrimeFactor { prime: 5, power: 3 }]
[INFO] [stdout]  right: [PrimeFactor { prime: 5, power: 3 }, PrimeFactor { prime: 2, power: 2 }]

workingjubilee commented 2 months ago

It is probably the case that these include incorrect sort algorithm impls.

I am bringing these to the attention of @orlp and @Voultapher if they wish to inspect the regressions further.

coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem.
- the tests seem to rely on sort_unstable_by having sorted things a certain way, as it only sorts by frequency, not by frequency and then natural order as the tests seem to imply is required: https://github.com/lht102/coding-problems-practice/blob/5b22b46182abe6d0617d087e6b8a0fa22f2edada/leetcode/src/00451_sort_characters_by_frequency/solution.rs#L10-L27
- however, the original problem specifically mentions that it's okay to get either "eert" or "eetr": https://leetcode.com/problems/sort-characters-by-frequency/description/
hopscotch has an ord_correct test that is now failing, and while I can't find the sort call, it feels likely the answer is wrong or sort may be implicitly called by quickcheck: https://github.com/plaidfinch/hopscotch/blob/54483dabb3252f00f73432984b42bd00f7471466/tests/hopscotch.rs#L136-L146

workingjubilee commented 2 months ago

I think I am wrong for ~~some~~ most, actually.

anagrams: uses a HashTrieMap, then asserts on an iter-then-collect-to-vec, a somewhat notorious hazard. https://github.com/rainhead/rust-anagrams/blob/d42b99a1c5d3465d65a6ded34aff0857a0e52bfe/src/lib.rs#L35-L41
atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs
bevy_serde_macros
- seems to be asserting on serialization/deserialization being handled a certain way, but now some whitespace is in the keys?
gaffer
- No sorting appears to occur in https://github.com/survemobility/gaffer/blob/fd1cc240414bb5ee61c840538e0ed4f83e5d58dd/tests/integration.rs or the rest of the library at a glance, so I believe that's instead a concurrency bug?
- Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? https://github.com/survemobility/gaffer/commit/f238bae67739f0df4ef4a97b245e8554c8aff6f2
proxmox-api: seems to be another case of asserting after iterating over a hashmap: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255 actually see orlp's comment.
rwini: another case of asserting on the iteration of a hashmap!
- https://github.com/zerodegress/rwini-rs/blob/master/src/data/ini.rs
- https://github.com/zerodegress/rwini-rs/blob/b7a098fe1e2785bd859a1753ace77c8662d2ebde/src/parser/mod.rs#L42-L66
workpool: another case where it's less clear and seems likely to be a concurrency issue? https://github.com/ggriffiniii/lateral/blob/469c20d70b2d36b8bc5726281cb98989f4423bec/workpool/src/pool/dynamic_pool.rs#L504-L515

bevy_serde_macros

``` assertion `left == right` failed [INFO] [stdout] left: {" Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], " Component1": Array [Array [Number(0), Null], Array [Number(1), Null]], " Component2": Array [Array [Number(1), Object {"target": Number(0)}]]} [INFO] [stdout] right: {"Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], "Component2": Array [Array [Number(1), Object {"target": Number(0)}]], "Component1": Array [Array [Number(0), Null], Array [Number(1), Null]]} ```

Either way, these do not seem to be directly from the sort changes.

Voultapher commented 2 months ago

coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem. the tests seem to rely on sort_unstable_by having sorted things a certain way,

sort_unstable* is unstable as the name says and the new implementation is deterministic as the previous one was, but it produces a different equal element ordering than the one before, so a test that relies on one specific order but uses sort_unstable is only valid for one version of Rust.

however, the original problem specifically mentions that it's okay to get either "eert" or "eetr"

I think the correct way to test that would be to do stable sort or assert_matches!(val, "eert" | "eetr").

Voultapher commented 2 months ago

atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs

Looking at the implementation, it sleeps for milliseconds, I fail to see how that could ever work reliably. The OS can decide to wake the threads up at some later point in time, sleep does not guarantee a fixed wakeup time. Plus it doesn't use latches, so it's spawn + sleep before the next thread is spawned. IMO that implementation is just plain broken.

Voultapher commented 2 months ago

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

Voultapher commented 2 months ago

gaffer Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? https://github.com/survemobility/gaffer/commit/f238bae67739f0df4ef4a97b245e8554c8aff6f2

I suspect it's just racy. I checked the project out locally and ran cargo t, on my initial run I got:

test source::test::queued_resets_recurring ... FAILED

failures:

---- source::test::queued_resets_recurring stdout ----
thread 'source::test::queued_resets_recurring' panicked at src/source.rs:320:9:
assertion `left == right` failed
  left: [Tester(3), Tester(2), Tester(1)]
 right: [Tester(2)]

failures:
    source::test::queued_resets_recurring

test result: FAILED. 36 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Which is a completely different test, and the one in the crater run passed. Second run all tests pass.

orlp commented 2 months ago

@workingjubilee I find the proxmox-api case very strange: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255

Note that the portion of the serde output that was changed is actually a BTreeMap, not a HashMap.

workingjubilee commented 2 months ago

@orlp Thanks for catching that! I retract my comments re: "iterating over a hashmap" for that library, but it still seems very odd.

workingjubilee commented 2 months ago

Somewhat shockingly, the Project Euler code is also operating with a hashmap: https://github.com/chubei/project_euler/blob/d100bed4c6b7c417276ccba051567c53ed869d40/src/lib.rs#L39-L48

I really expected that one to be due to sorting issues...

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

There is a certain sort of person who only ever writes and runs a test once.

I do not mean to accuse, but if they only ran the test a few times, it is possible the hashes happened to work out for them. Even if the hashing perfectly randomizes the keys, if you assert on the order of only 2 items, then it comes up as a coinflip either way. It's reasonable for that to pass 2-4 times in a row if you are lucky.

workingjubilee commented 2 months ago

I downloaded workpool and reran its tests a few times. It's another racy case. I think we've correctly identified racy library tests in:

atrocious_sort
gaffer
workpool

And what seem to be definitely hashmap problems:

chubei/project-euler
rainhead/rust-anagrams
rwini

Unless these are fixed promptly, we should ignore those in future crater runs which cover tests.

Correctly root-caused to the sort impl, but known accepted breakage:

lht102/coding-problems-practice

MPCHomeControl is a float problem, so I'll bring it to https://github.com/rust-lang/rust/issues/128898

That leaves the following as still open questions as to why the test is failing to begin with:

bevy_serde_macros
hopscotch
proxmox-api

saethlin commented 2 months ago

I haven't looked at these crates carefully.

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

Voultapher commented 2 months ago

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

It's a little off topic and if this turns into a bigger discussion let's please find a better place for it. My 2 cents, if your tests are order dependent and or can't be run in parallel they are broken. I'd rather tell these libraries "you choose to break the tooling's expectations, so you can't expect us to care for your use-case".

Specifically with the racy libraries here I'm not sure it would even help:

atrocious_sort: Broken impl, single-core testing won't help.
gaffer: Docs say: "some of the tests are very dependent on timing and will fail if run slowly", so not sure how single-core testing would help.
workpool: No docs whatsoever, sporadic test failure, not sure how single-core testing would help.

Voultapher commented 2 months ago

bevy_serde_macros

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

Voultapher commented 2 months ago

proxmox-api

Is once again also hashmap order, the main branch defines Names as:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi_btree::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi_btree::<NumberedNames, _>"
    )]
    names: BTreeMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Where the field additional_data is a hashmap. But in the test it only accounts for the name_2 key and not for name1 or name2 which are the ones that swap place in the produced JSON string. However if look at the v0.1.1 tag Names has a different definition:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi::<NumberedNames, _>"
    )]
    names: HashMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Now the fields names and additional_data are HashMaps. This is the version that was tested by the crater run. Mystery solved 😄

Nemo157 commented 2 months ago

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

Equality doesn't care about which order you check the keys in, the debug output is different but that shouldn't affect the comparison result. There seem to be extra spaces in the keys on one side, which I assume means the test itself is broken. EDIT: looking at it a little more, seems to be related to not accounting for whitespace in stringify! of ty fragments, so tests :: Component1 gets split at the :: and given the name Component1 instead of the expected Component1, maybe https://github.com/rust-lang/rust/pull/125174 related.

Voultapher commented 2 months ago

hopscotch

This is not a regression. Locally the tests partial_ord_correct and ord_correct fail with some chance both on stable and nightly. I've seen all 4 combinations of success and fail on for both versions of Rust. I think it depends on wether quickcheck finds a code path where the properties don't hold, which I assume depends on randomness.

Digging into why it fails, as example input that fails let's take left = [(66, 1)] and right = [(66, 0), (0, 0)]. The tests compare partial_cmp and cmp results. compare_as_iters answers what the keys compared via the the iterator::cmp method yields. In this case [66].iter().cmp([66, 0].iter()) == Ordering::Less. In contrast the hopscotch::Queue::cmp method uses completely different logic and does offset and tag comparisons before going over to value, not key, iter comparison. In this specific instance tag comparison yields Ordering::Greater which causes the mismatch and test failure.

In my opinion either the test does not test what the author wanted it to test, or the implementation does not do what the author intended it to do.

Voultapher commented 2 months ago

Given the lack of active maintenance and sub 10k downloads, it might make sense to deny-list some of these crates that fail because of broken tests and or implementations.

Voultapher commented 2 months ago

Equality doesn't care about which order you check the keys in

My bad, indeed the other.get call makes it order independent.

orlp commented 2 months ago

Would it be possible to add a clippy warning to some code patterns which attempt to assert an expression that depends on hashmap iteration order? Obviously we can't catch everything, but perhaps some of the more common patterns?

workingjubilee commented 2 months ago

I guess I retract my retraction regarding proxmox-api???

workingjubilee commented 2 months ago

I guess we should test a revert of https://github.com/rust-lang/rust/pull/125174

workingjubilee commented 2 months ago

Hm, #125174 was cratered.

workingjubilee commented 2 months ago

The proxmox-api tests may work on the next crater run, so let's not exclude those.

I've opened some PRs and issues on the relevant crates in the hopes of them fixing their code so that we can continue using their test data:

It seems unlikely these will all be resolved by the next beta crater run.

In particular, the gaffer tests seem to be completely insoluble as it is an issue in essentially all the tests, so let's just skip that crate's tests from now on.

workingjubilee commented 2 months ago

I have opened some followups:

Everything else is accepted breakage, or (hopefully) will be published and fixed for the next beta crater, so I think we're done here.

workingjubilee commented 2 months ago

Thank you for playing, everyone!

rust-lang / rust

regression: assertion `left == right` failed but it depends on library-unstable details #128899