python-hyper / draft-http2-debug-state

Source information for the HTTP/2 debug state internet draft.
6 stars 4 forks source link

Maintain counts of written bytes and padding for connection #5

Open louiscryan opened 8 years ago

louiscryan commented 8 years ago

If the spec is only dumping information for current streams then to be able to diagnose issues in the assignment of the current available connection-window bytes to streams for actual writes we would need to know how many bytes & padding bytes were written for the lifetime of the connection.

It seems like we could get rid of the conn* prefixed entries and just always have a stream '0' in the map to represent this state.

An example implementation issue is that a flow-control implementation grants a slice of the connection window to a stream to allow it to write, the stream is reset and the write never occurs but the granted slice is not returned. This zombie slice of allocated connection window eventually causes starvation.

The dump would show this behavior by:

Lukasa commented 8 years ago

I'd like to understand this use-case further, because my initial reading of this feels like it's an extremely specific problem that is likely to occur only in certain architectures. I suspect that's because I've misunderstood what you're going for here.

In my own words, the problem you're trying to diagnose is that where an implementation believes it has space in the connection flow control window to send, but it has assigned that space to a stream that has stalled. In particular, the implementation has correctly handled the RST_STREAM (transitioning it to CLOSED and removing it from the 'streams' object), but has not returned the assigned connection window to its window.

I'm not sure how your proposal identifies this issue though. Addressing your two points:

the total number of bytes written on the connection is less than the aggregate window received

This is definitionally always true: if it wasn't, the implementation would be in error. At best the implementation may have used exactly the number of bytes it was credited, but that's unlikely because it leads to increased latency.

sampling the streams over time would show them as stalled as the number by bytes written was unchanged

Any stream that has been RST or closed will always exhibit this behaviour. That means, in the example you outlined, all closed streams will appear stalled.

I think the debug output available today is close to sufficient to identify the problem. In the instance you outline, where the server has connection window bytes available to send but has allocated those to streams that are no longer being processed, the response to the HEADERS frame will show that the server believes it has space on the connection to send data, but is still not doing it. That allows you to determine that the server is at fault, and to look more closely into the server to discover the bug.

louiscryan commented 8 years ago

@Lukasa I think you're right that the scenario I outlined is detectable by observing that individual streams have made no progress.

I have a nagging concern about the opacity of the allocation mechanism between the connection and the stream and we've definitely seen bugs along the lines of the one I outlined. Ill think about it some more.

Lukasa commented 8 years ago

Yeah, agreed. Let's keep this under consideration.