superfly / corrosion

Gossip-based service discovery (and more) for large distributed systems.
https://superfly.github.io/corrosion/
Apache License 2.0
707 stars 22 forks source link

timestamp subtract with overflow #250

Open runsisi opened 3 months ago

runsisi commented 3 months ago

in debug build, run corrosion exec on node with faster clock cause other nodes panic.

https://github.com/superfly/corrosion/blob/main/crates/corro-agent/src/agent/handlers.rs#L860.

let recv_lag = change
    .ts()
    .map(|ts| (agent.clock().new_timestamp().get_time() - ts.0).to_duration()); // <--- panic here

since the Broadcast changeset sent from node with faster clock has newer timestamp than the receiver node, so subtract with overflow occurs.

the panic backtrace:

thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/mirrors.ustc.edu.cn-12df342d903acd47/uhlc-0.7.0/src/ntp64.rs:164:14:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/core/src/panicking.rs:142:5
   3: <uhlc::ntp64::NTP64 as core::ops::arith::Sub>::sub
             at /root/.cargo/registry/src/mirrors.ustc.edu.cn-12df342d903acd47/uhlc-0.7.0/src/ntp64.rs:164:14
   4: <&uhlc::ntp64::NTP64 as core::ops::arith::Sub<uhlc::ntp64::NTP64>>::sub
             at /root/.cargo/registry/src/mirrors.ustc.edu.cn-12df342d903acd47/uhlc-0.7.0/src/ntp64.rs:173:9
   5: corro_agent::agent::handlers::handle_changes::{closure#0}::{closure#3}
             at /home/runsisi/build/corrosion/crates/corro-agent/src/agent/handlers.rs:860:23
   6: <core::option::Option<corro_types::broadcast::Timestamp>>::map::<core::time::Duration, corro_agent::agent::handlers::handle_changes::{closure#0}::{closure#3}>
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/core/src/option.rs:1072:29
   7: corro_agent::agent::handlers::handle_changes::{closure#0}
             at /home/runsisi/build/corrosion/crates/corro-agent/src/agent/handlers.rs:858:24
runsisi commented 2 months ago

here is another panic:

https://github.com/superfly/corrosion/blob/main/crates/corro-agent/src/agent/util.rs#L999

for (_, changeset, _, _) in changesets.iter() {
    if let Some(ts) = changeset.ts() {
        let dur = (agent.clock().new_timestamp().get_time() - ts.0).to_duration(); // <--- panic here
        histogram!("corro.agent.changes.commit.lag.seconds").record(dur);
    }
}

backtrace:

thread 'tokio-runtime-worker' panicked at /home/runsisi/build/corrosion/uhlc/src/ntp64.rs:164:14:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/90e321d82a0a9c3d0e3f180d4d17541b729072e0/library/core/src/panicking.rs:142:5
   3: <uhlc::ntp64::NTP64 as core::ops::arith::Sub>::sub
             at /home/runsisi/build/corrosion/uhlc/src/ntp64.rs:164:14
   4: <&uhlc::ntp64::NTP64 as core::ops::arith::Sub<uhlc::ntp64::NTP64>>::sub
             at /home/runsisi/build/corrosion/uhlc/src/ntp64.rs:173:9
   5: corro_agent::agent::util::process_multiple_changes::{closure#0}::{closure#0}::{closure#0}::{closure#0}
             at /home/runsisi/build/corrosion/crates/corro-agent/src/agent/util.rs:999:27
   6: tokio::runtime::context::runtime_mt::exit_runtime::<corro_agent::agent::util::process_multiple_changes::{closure#0}::{closure#0}::{closure#0}::{closure#0}, core::result::Result<alloc::vec::Vec<(corro_types::actor::ActorId, corro_types::broadcast::Changeset, corro_base_types::CrsqlDbVersion, corro_types::broadcast::ChangeSource)>, corro_types::agent::ChangeError>>
             at /root/.cargo/registry/src/mirrors.ustc.edu.cn-12df342d903acd47/tokio-1.34.0/src/runtime/context/runtime_mt.rs:35:5
   7: tokio::runtime::scheduler::multi_thread::worker::block_in_place::<corro_agent::agent::util::process_multiple_changes::{closure#0}::{closure#0}::{closure#0}::{closure#0}, core::result::Result<alloc::vec::Vec<(corro_types::actor::ActorId, corro_types::broadcast::Changeset, corro_base_types::CrsqlDbVersion, corro_types::broadcast::ChangeSource)>, corro_types::agent::ChangeError>>