private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
41 stars 23 forks source link

Sharded shuffle concurrency test failures with rust 1.78 #1036

Closed andyleiserson closed 4 months ago

andyleiserson commented 4 months ago

I see this failure consistently on my local machine (aarch64-apple-darwin) after upgrading to rust 1.78, but not in CI (e.g. https://github.com/private-attribution/ipa/actions/runs/8929174792/job/24526364966?pr=1035).

Command: cargo test -p ipa-core --release --features "shuttle multi-threading" -- shuffle::sharded::tests::empty (non_empty_input also fails)

Output:

---- protocol::ipa_prf::shuffle::sharded::tests::empty stdout ----
TestWorld random seed 9634094103382193149
2024-05-02T19:42:39.728593Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728612Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728625Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728627Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728633Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728636Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728646Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=3.08µs time.idle=31.9µs
2024-05-02T19:42:39.728651Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=21.5µs time.idle=38.1µs
2024-05-02T19:42:39.728957Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728965Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=583ns time.idle=337µs
2024-05-02T19:42:39.728967Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=5.67µs time.idle=336µs
2024-05-02T19:42:39.728969Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728974Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
WARNING: Shuttle only correctly models SeqCst atomics and treats all other Orderings as if they were SeqCst. Bugs caused by weaker orderings like Relaxed may be missed. See https://docs.rs/shuttle/*/shuttle/sync/atomic/index.html#warning-about-relaxed-behaviors for details or to disable this warning.
2024-05-02T19:42:39.728995Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=500ns time.idle=359µs
2024-05-02T19:42:39.728997Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=18.5µs time.idle=345µs
TestWorld random seed 5841473747150548803
2024-05-02T19:42:39.729123Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729125Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729132Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: close time.busy=2.71µs time.idle=173µs
2024-05-02T19:42:39.729135Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: close time.busy=5.38µs time.idle=160µs
2024-05-02T19:42:39.729138Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: close time.busy=2.29µs time.idle=161µs
2024-05-02T19:42:39.729140Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729141Z  INFO {h=A}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729148Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729149Z  INFO {h=A}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729156Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729157Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729162Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729163Z  INFO {h=B}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729168Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729170Z  INFO {h=B}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729176Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729177Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729184Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729185Z  INFO {h=C}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729191Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729192Z  INFO {h=C}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729493Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729499Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729533Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729840Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729845Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729850Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730153Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730159Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730163Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
test panicked in task 'main-thread'
failing schedule:
"
9107e50ac8c6dcd4f3eb9db51b00000000020004000000ff0a0806000c0000000000000e000a
0a00000a000a0e0000000000000000000000000000100c0000000c0000000c00100c00000000
0000000000000000000000001200000008080000000800080000001200000000000000000000
00000000000000000000000000000000000014080a0c000016180000001a00201c001e220000
2400002a2826002c00002e00323000003600ff69000000000000702c2c2c7000742c00003030
0000000030743000000000000000000000000000000000000000000000000000000078003838
7c3c00383878003c3c7c0000003c000000000000000000000000000000000000000000000000
0000000000000000000080280084282880003030308400302800000000000000000000000000
0000000000000000000000000000000000000000008c8800343434003488003c003c003c8c00
3c00000000000000000000000000000000000000000000000000000000000000000000000090
00280028000000942828902c2c002c942c000000000000000000000000000000000000000000
0000000000000000000000000098282800009c2c282c280000000000000000982c2c009c0000
000000000000000000000000000000000000000000a0340000a438383838a40000340034a000
3400000000000000000000000000000000000000000000000000000000000000000000000000
0000a8006c000000006c6ca86c00000000000000000000000000000000b00044004400ac4440
404000004400ac00b00000400000000000000000000000000000000000000000000000000000
00000000000000000000000000000000b42c2c002c0000b4b80030302c30b800000030000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000bc545454005400bc0000000000000000000000000000000000000000c000c4
4400484848000048c4004444c000440000000000000000000000000000000000000000000000
00000000000000000000000000000000000000ccc830303000cc2828300000002828c8000000
000000000000000000000000000000000000000000000000000000000000000000d05c5c5cd0
5c000000000000000000000000000000000000000000000000000000d4000000600000006060
d400600000000000000000000000d834343434000000dcd83c00003c003c3c0000dc00000000
00000000000000000000000000000000000000000000000000000000e0000000400040e40048
4848004800e440000000e0000000000000400000000000000000000000000000000000000000
00000000e84040004000e8ec404800000000004848ec00480000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000f0680000680000
686800f00000000000000000000000000000000000f40000f80038380038f800343400380000
000034f400003400000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000fc4040004000000041fc4444440044000100000000000000
0000000000000000000000000000000000000000000000000000000000000000040100000050
00005050040100500000000000000000000000000000000008010c013c3c00003c3c38000000
0038380c39080100000000000000000000000000000000000000000000000000000000000010
4d4c00004c10014c000000000000000000000000000000000000000000000000140144001801
4448000048444400001449000048001801000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000
"
pass that string to `shuttle::replay` to replay the failure
thread 'protocol::ipa_prf::shuffle::sharded::tests::empty' panicked at ipa-core/src/helpers/transport/stream/collection.rs:90:25:
assertion failed: will_wake
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   4: <ipa_core::helpers::transport::receive::LogErrors<S,T,E> as futures_core::stream::Stream>::poll_next
   5: <ipa_core::helpers::buffers::unordered_receiver::Receiver<S,C,M> as core::future::future::Future>::poll
   6: ipa_core::helpers::gateway::receive::MpcReceivingEnd<M>::receive::{{closure}}::{{closure}}
   7: ipa_core::protocol::ipa_prf::shuffle::sharded::shuffle::{{closure}}
   8: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
   9: <futures_util::future::maybe_done::MaybeDone<Fut> as core::future::future::Future>::poll
  10: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
  11: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
  12: <futures_util::stream::futures_ordered::OrderWrapper<T> as core::future::future::Future>::poll
  13: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
  14: <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
  15: <futures_util::stream::stream::collect::Collect<St,C> as core::future::future::Future>::poll
  16: <ipa_core::test_fixture::world::TestWorld<ipa_core::test_fixture::world::WithShards<_,D>> as ipa_core::test_fixture::world::Runner<ipa_core::test_fixture::world::WithShards<_,D>>>::semi_honest::{{closure}}
  17: ipa_core::protocol::ipa_prf::shuffle::sharded::tests::sharded_shuffle::{{closure}}
  18: shuttle::future::block_on
  19: shuttle::thread::thread_fn
  20: generator::stack::StackBox<F>::call_once
  21: generator::gen_impl::gen_init
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

failures:
    protocol::ipa_prf::shuffle::sharded::tests::empty

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 30 filtered out; finished in 0.01s

error: test failed, to rerun pass `-p ipa-core --lib`
akoshelev commented 4 months ago

Can you try to pin the shuttle version 0.6 and see if it works. There was also a release of 0.7 along with rustc that could've broken it.

andyleiserson commented 4 months ago

It fails with either version 0.6 or 0.7 of shuttle. Unfortunately only release builds fail.

I added some debug prints to StreamCollection::add_waker that suggest it is a false positive, because the original and replacement wakers are the same until cloned in add_waker. (The pairs of hex values are the data and vtable members of the raw waker.)

2024-05-02T21:49:30.542275Z  INFO ipa_core::helpers::transport::stream::collection: add_waker(0x6000038a40e0): H2 protocol/iter0/cardinality (0x14c307ad0 0x1004ec598)
2024-05-02T21:49:30.542310Z  INFO ipa_core::helpers::transport::stream::collection: inserting waker 0x14c307ad0 0x1004ea2f0
...
2024-05-02T21:49:35.377006Z  INFO ipa_core::helpers::transport::stream::collection: add_waker(0x6000038a40e0): H2 protocol/iter0/cardinality (0x14c307ad0 0x1004ec598)
2024-05-02T21:49:35.377050Z  INFO ipa_core::helpers::transport::stream::collection: found existing waker 0x14c307ad0 0x1004ea2f0

The original and cloned vtables even have the same contents, even though they are distinct in memory:

(lldb) x/4gx 0x1004ea2f0
0x1004ea2f0: 0x000000010004e958 0x000000010004de28
0x1004ea300: 0x000000010004ea80 0x000000010004d824
(lldb) x/4gx 0x1004ec598
0x1004ec598: 0x000000010004e958 0x000000010004de28
0x1004ec5a8: 0x000000010004ea80 0x000000010004d824
andyleiserson commented 4 months ago

Here we go: https://github.com/rust-lang/rust/pull/119863

akoshelev commented 4 months ago

great catch, I guess we need to get rid of will_wake checks - they don't do us any good.

Will work on that