int08h / roughenough

A Roughtime secure time sync client and server written in Rust
https://int08h.com/post/roughenough-a-rust-roughtime-server/
Apache License 2.0
123 stars 21 forks source link

Panic in src/responder.rs:114:18 { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable } #33

Closed int08h closed 1 year ago

int08h commented 1 year ago

Full log message was:

thread 'main' panicked at 'send_to failed: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/responder.rs:114:18

Seen on roughtime.int08h.com

int08h commented 1 year ago

Backtrace of bug

Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:    1: core::panicking::panic_fmt
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:    2: core::result::unwrap_failed
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1616:5
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:    3: roughenough::responder::Responder::send_responses
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:    4: roughenough::server::Server::process_events
Nov 27 23:45:15 roughenough-1f run_server.bash[3662]:    5: roughenough_server::main
int08h commented 1 year ago

Fix has been deployed

int08h commented 1 year ago

The problem was with the way responses were sent to clients. This is the problematic block in responder.rs

let bytes_sent = socket
    .send_to(&resp_bytes, &src_addr)
    .expect("send_to failed");

Roughtime's server is non-blocking (async) built on Mio. The client sockethere is a mio::net::UdpSocket and is attempting to send (send_to()) a reply (&resp_bytes) to a client (&src_addr).

The EAGAIN/EWOULDBLOCK response from send_to() is Linux telling us "resources are full, try again later". But as you can see in the snippet, there is no error handling of the results from calling send_to(). There is no reattempt. Instead we get a runtime panic.

The "fix" is a band-aid: check the return value of send_to, bumping a counter on any errors, but otherwise ignore errors from send_to().

One might think "a correct fix is to re-attempt delivery". That's probably correct but the retry logic must ensure that the MIDP time of the in-flight response s still within the uncertainty RADI.