quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.85k stars 394 forks source link

quinn-proto: `Chunks` panic-on-drop behavior is a footgun and makes control flow annoying #1531

Closed abonander closed 1 month ago

abonander commented 1 year ago

I was just bitten by this assert while trying to debug a silent failure of my application during testing: https://github.com/quinn-rs/quinn/blob/main/quinn-proto/src/connection/streams/recv.rs#L289-L292

This was in a spawned Tokio task ~which apparently suppresses the normal printing of panic backtraces to stderr~ and I was running this as a test which buffered the panic backtrace but not letting it complete. And I hadn't yet wired it up to check the task's exit status as it's supposed to be long-running anyway and I'm still working on the principal implementation here.

On StreamEvent::Opened or StreamEvent::Readable I essentially call into code that looks like this:

impl<'a> StreamReadContext<'a> {
    fn read(mut self) -> anyhow::Result<()> {
        // `self.stream: quinn_proto::RecvStream`
        let mut chunks = self.stream.read(true)?;

        loop {
            match self.activity {
                StreamActivity::PendingHeader => {
                    if let Some(header) = self.reader.try_peek_header().context("invalid header")? {
                        Notification::validate_header(header).context("invalid header")?;
                        *self.activity = StreamActivity::ReadingNotification { header };
                    };
                }
                StreamActivity::ReadingNotification { header } => {
                    if self.reader.input_len() >= header.length {
                        let notification = Notification::decode(&mut *self.reader)
                            .context("invalid notification")?;

                        // Handle `Notification`
                    }
                }
            }

            // Lazily read chunks so we don't buffer a ton of data up-front.
            if let Some(chunk) = chunks.next(MAX_CHUNK_SIZE)? { // 8 KiB
                tracing::trace!("read {} bytes", chunk.bytes.len());
                self.reader.push(chunk.bytes)?;
            } else {
                break;
            }
        }

        // If we need to transmit it'll be checked when we return to the event loop.
        let _ = chunks.finalize();

        Ok(())
    }
}

As implied by the code here, Notification messages have a header that gives their length so we know how much data we need to buffer to fully decode them. reader is the type that buffers chunks and implements the right traits for Notification::decode().

Due to some bug in my code, .try_peek_header() is returning an error which normally would be logged when returned from this method. However, because the impl Drop for Chunks executes on the return, and because I hadn't called .finalize() yet, this is turned into a panic!(), which in the current state of my code just causes the task to silently exit.

The thing is, I don't need the ShouldTransmit flag because this immediately returns to an event loop where I call all the appropriate .handle*() and .poll*() methods on the quinn_proto::Connection. So the implicit finalize triggered by the Drop impl would otherwise be sufficient for me.

Because the panic message was being suppressed, it took me an annoyingly long time to figure this out. It wasn't until I ran the code in a debugger (which helpfully automatically breaks on panics) to figure this out. While I understand the point of the debug_assert!() is to teach the user how to use this API correctly, in my case it was much more of a hindrance than anything.

(~Tokio's partly to blame here... why suppress backtraces in debug mode? That doesn't make any sense.~ However, the panic doesn't strictly need to happen either. quinn-proto is by design a low-level API, so why is it assuming the user might not know what they're doing here?)

abonander commented 1 year ago

As a workaround, I'm calling RecvStream::read() outside this method and including &mut Chunks in StreamReadContext instead of the RecvStream, then always calling .finalize() afterward regardless of what it returns.

abonander commented 1 year ago

Belatedly I have realized that panics not being printed out were my fault. My code was technically running as a test, so Cargo was buffering the output as it normally does, but I was not letting the test complete, as I was cancelling it early while watching the log output, which wasn't buffered.

Still, my primary issue stands, that the panic doesn't need to happen. Worse yet, it's a debug_assert!() so it's not strictly required for correct functioning of the API. It's just a weird little random speedbump that doesn't really need to be there, IMHO.

djc commented 1 year ago

Sorry for the trouble you've been having with our code! When we designed this we felt it was important to make sure callers use finalize(). We tried to find some way to make a "must move"-like type (something that can't be dropped), but unfortunately that's not possible in today's Rust. I think the debug_assert!() functioned exactly as intended, to make you aware that you should be calling transmit soon after dropping Chunks. In your case, you were already aware, but it's not obvious. Then there were further issues with your test strategy that made it so that this was much harder to localize.

If you had gotten a full panic (plus backtrace) on your first test failures, would this have been as aggravating to you as it seems to have been now?

abonander commented 1 year ago

If you had gotten a full panic (plus backtrace) on your first test failures, would this have been as aggravating to you as it seems to have been now?

I still had to restructure my code to make it not panic, so that was annoying regardless. If you're married to the debug_assert!() I would suggest to at least make it clear in the docs that it will panic if you don't call .finalize() which it isn't right now: https://docs.rs/quinn-proto/latest/quinn_proto/struct.Chunks.html#method.next

djc commented 1 year ago

I'm not "married" to the assert -- there's clearly a trade-off for which the best solution is not obvious to me, and I was asking for your feedback.

abonander commented 1 year ago

My original point still stands: this is a low-level API, right? So why not assume the user knows what they're doing?

It's certainly worth documenting that one should check for pending transmits after reading, because that might not immediately be obvious. Either way, documentation on the Chunks type could be better.

aochagavia commented 1 year ago

I also was bitten by this when writing tests that use Chunks, because one of the assertions failed and it caused a double panic (the assertion panics, then Chunk is dropped, and the drop code panics too). It was puzzling and difficult to debug (somehow I wasn't able to get a proper backtrace).

Not sure what an alternative solution would be, just wanted to mention that in my eyes it wasn't very ergonomic. I ended up extracting the logic that used Chunks into a separate function that never panics, to avoid problems: https://github.com/quinn-rs/quinn/blob/3eb26361dba85f13b69e0eff6d934b28f70a37f8/quinn-proto/src/tests/mod.rs#L2183-L2194

djc commented 1 year ago

Okay, I think I would be on board with removing the assertion in favor of better docs. Anyone want to submit a PR?

boserohan commented 1 year ago

Can this be assigned to me? I am new to contributing, so if anyone can guide me to the process that would be great.

djc commented 1 year ago

Sure -- what do you want to know? It's mostly just -- send a PR, ideally with small atomic commits each of which should be able to pass CI.

boserohan commented 1 year ago

Thanks, I have submitted PR 1703. Please could you kindly review the changes. I think I will have more questions later on as I get to contributing more to this project.

abonander commented 2 months ago

Just ran into this again. It ends up being really annoying with control flow:

        'inner: loop {
            match chunks.next(MAX_CHUNK_SIZE) {
                Ok(Some(chunk)) => {
                    // use `chunk`
                }
                Ok(None) => {
                    // Note: this needs to be called in every diverging branch
                    // or `chunks` will panic on-drop

                    let _ = chunks.finalize();
                    break 'outer;
                },
                Err(proto::ReadError::Blocked) => { 
                    let _ = chunks.finalize();
                    break 'inner; 
                },
                Err(proto::ReadError::Reset(code)) => {
                    let _ = chunks.finalize();
                    eyre::bail!("peer reset stream {stream_id:?} with code {code}");
                }
            }
        }

Since #1703 appears stalled, I may just open my own PR.