narrowlink / ipstack

Asynchronous lightweight userspace implementation of TCP/IP stack for Tun device
Apache License 2.0
51 stars 12 forks source link

Panic #45

Closed pronebird closed 5 months ago

pronebird commented 6 months ago

Using 0.0.9. Caught a panic today:

[2024-05-07 11:53:23.668][ipstack::stream::tcp][WARN] 1615879314
[2024-05-07 11:53:23.671][ipstack::stream::tcp][WARN] 1615887506
[2024-05-07 11:53:23.677][ipstack::stream::tcp][WARN] 1615879314
[2024-05-07 11:53:23.678][ipstack::stream::tcp][WARN] 1894419854
[2024-05-07 11:53:23.882][panic][ERROR] thread 'tokio-runtime-worker' panicked at 'Please report these values at: https://github.com/narrowlink/ipstack/': /Users/pronebird/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ipstack-0.0.9/src/stream/tcp.rs:495
   0: backtrace::capture::Backtrace::new
   1: <backtrace::capture::Backtrace as core::default::Default>::default
   2: log_panics::Config::install_panic_hook::{{closure}}
   3: <REDACTED>::abort_on_panic::{{closure}}
   4: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2029:9
      std::panicking::rust_panic_with_hook
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:785:13
   5: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:651:13
   6: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys_common/backtrace.rs:171:18
   7: rust_begin_unwind
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
   8: core::panicking::panic_fmt
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
   9: <ipstack::stream::tcp::IpStackTcpStream as tokio::io::async_write::AsyncWrite>::poll_flush
  10: <ipstack::stream::tcp::IpStackTcpStream as tokio::io::async_read::AsyncRead>::poll_read
  11: <ipstack::stream::tcp_wrapper::IpStackTcpStream as tokio::io::async_read::AsyncRead>::poll_read
  12: tokio::io::util::copy::CopyBuffer::poll_copy
  13: tokio::io::util::copy_bidirectional::transfer_one_direction
  14: <REDACTED>::forwarder::Forwarder::handle_tcp::{{closure}}::{{closure}}
  15: tokio::runtime::task::core::Core<T,S>::poll
  16: tokio::runtime::task::harness::Harness<T,S>::poll
  17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  18: tokio::runtime::scheduler::multi_thread::worker::Context::run
  19: tokio::runtime::context::runtime::enter_runtime
  20: tokio::runtime::scheduler::multi_thread::worker::run
  21: tokio::runtime::task::core::Core<T,S>::poll
  22: tokio::runtime::task::harness::Harness<T,S>::poll
  23: tokio::runtime::blocking::pool::Inner::run
  24: std::sys_common::backtrace::__rust_begin_short_backtrace
  25: core::ops::function::FnOnce::call_once{{vtable.shim}}
  26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2015:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/alloc/src/boxed.rs:2015:9
      std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/sys/pal/unix/thread.rs:108:17
  27: __pthread_start

By looking at the code issuing the panic I can conclude that inflights_packets is empty, because only four lines are logged and the loop is entirely skipped. What's really going on in this piece of code? Maybe it shouldn't panic if there are no more inflight packets to flush?

pronebird commented 6 months ago

I have also noticed this piece of code:

https://github.com/narrowlink/ipstack/blob/main/src/stream/tcp.rs#L459-L462

Note that early return there on UnexpectedEof. self.tcb.retransmission remains set. Maybe we should reset retransmission before packet_sender.send()?

RoDmitry commented 6 months ago

I'm using self.tcb.retransmission.take() to preliminarily set it to None.

SajjadPourali commented 6 months ago

The problem is looks related to the logic of tcp stack, and I don't believe that it can be fixed by taking. If the function returns UnexpectedEof, it means the stream is terminated, and poll_flush won't be called anymore! Based on what you provided, ipstack received a retransmission of data that is not available in the inflight_packets anymore, as you mentioned. There are two reasons that this can happen:

1) We removed that buffer from inflight_packets improperly. 2) We are miscalculating the RetransmissionRequest in the check_pkt_type function.

I intentionally left the panic there because I suspected the accuracy of the check_pkt_type for this part and aimed to identify the potential miscalculation.

RoDmitry commented 6 months ago

The problem is that inflight_packets is empty.

SajjadPourali commented 6 months ago

inflight_packets should not be empty at this stage. tcp_sequence: 1615887506 last_ack: 1615879314 8192 bytes should be available in the inflight_packets

SajjadPourali commented 6 months ago

I suspect that a regression occurred here: https://github.com/narrowlink/ipstack/commit/2d54e7ed0e7cbd20cda99256b2eff1a278f86759

SajjadPourali commented 6 months ago
self.inflight_packets.retain(|p| {
    let last_byte = p.seq.wrapping_add(p.payload.len() as u32);
    last_byte.saturating_sub(self.last_ack) > 0
        && self.seq.saturating_sub(last_byte) > 0
});

I am suspected to this part, 'seq' in 'inflight_packets' refers to the packet's sequence at the time it was pushed into the buffer. 'p.seq + 1' indicates the first byte, while 'p.seq' itself isn't available in the buffer.

SajjadPourali commented 6 months ago
self.inflight_packets.retain(|p| {
    let last_byte = p.seq.wrapping_add(p.payload.len() as u32);
    last_byte.saturating_sub(self.last_ack) > 0
        && self.seq.saturating_sub(last_byte) > 0
});

I am suspected to this part, 'seq' in 'inflight_packets' refers to the packet's sequence at the time it was pushed into the buffer. 'p.seq + 1' indicates the first byte, while 'p.seq' itself isn't available in the buffer.

self.seq.saturating_sub(last_byte) > 0 Based on the explanation provided above this condition is incorrect and removing it should solve the issue. @RoDmitry Could you please verify my conclusion?

RoDmitry commented 6 months ago

If I would understand the meaning of the self.last_ack and self.seq, then maybe I could verify šŸ˜. Maybe we need tests for this?

pronebird commented 6 months ago

@SajjadPourali patched it as per your suggestion. Will run the ipstack the entire day and monitor logs. https://github.com/narrowlink/ipstack/commit/f64ca2c8d85b55e50c17884797f1945c72aa71f5

SajjadPourali commented 6 months ago

If I would understand the meaning of the self.last_ack and self.seq, then maybe I could verify šŸ˜. Maybe we need tests for this?

In the TCB of ipstack, "last_ack" means the last byte acknowledged by the peer (i.e., SND.UNA), and "seq" is the last byte sent and still not acknowledged (i.e., SND.NEXT).

                   1         2          3          4
              ----------|----------|----------|----------
                     SND.UNA    SND.NXT    SND.UNA
                                          +SND.WND
RoDmitry commented 6 months ago

Then yes, checking for self.seq.saturating_sub(last_byte) > 0 is invalid, because self.seq can be equal to the last_byte of the packet, and so it will be dropped.