Indicate send blocking in data_moved events

quicwg / qlog

The IETF I-D documents for the qlog format

Other

86 stars 13 forks source link

Indicate send blocking in data_moved events #444

Open LPardue opened 3 weeks ago

LPardue commented 3 weeks ago

Fixes #132 (if merged).

It struck me that its the "data_moved" action that can be blocked, so rather than add a separate event, just add an optional field to an existing event. This helps ensure we can represent both stream and datagram data with all relevant fields.

rmarx commented 2 weeks ago

Conceptually this works, but I don't like the suggestion we'd have to omit logging length to indicate the data was actually not moved... in practice, length could be a strong signal that helps understand why exactly something was blocked for that reason (e.g., trying to write a ridiculous amount of data due to a bug).

If we keep this approach, I'd rather the guidance for this is "if there's a blocked_reason, that means the data was not actually moved, but a move was attempted, which was blocked due to blocked_reason". That imo also works :) (that's also somewhat analogous to Nick's approach in the issue, saying "The lack of any flags means it's unblocked.")

LPardue commented 2 weeks ago

Fair comment. For context, the quiche implementation has a stream write operation that takes an arbitrarily large buffer and can return a "partial write" result. For example, if the cwnd is 15 KB, and a 20 KB buffer is provided, well return an integer length result of 15 KB, indicating the app was blocked from writing the last 5 KB.

I was thinking it would be nice to reflect that in the qlog as a stream_data_moved with length 15 KB and blocked_reason of congestion window, etc.

I didn't think through everything in this proposal and maybe more guidance is useful, but it felt at the time of writing that combing the data in a single event would be more convenient for my use than if I had two separate events.

Always happy to hear other opinions or review alternatives though.

kazuho commented 2 weeks ago

Having indications for blocked events is paramount to investigating performance issues.

Therefore, I think they have to be distinct events, rather than tied to data_moved events. That is because not all QUIC implementations "move" data between the layers at the same moment (consider implementations having different size of buffers, or having no buffer at all).

nibanks commented 2 weeks ago

My personal preference is for separate events. We already have it built this way in msquic.

Also, I agree with Kazuho. We don't have any notion of moving data between layers.

LPardue commented 2 weeks ago

Ack. Would someone be willing to help on the alternative proposal? Even if ypu can just frame how the alternative works, we can design the event structure definition around that.