Don't close connection if channel without finished publish is closed

carlhoerberg commented 5 months ago

Describe the bug

RabbitMQ isn't respecting section 4.2.6 in the spec: https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf

Note that any non-content frame explicitly marks the end of the content. Although the size of the content is well-known from the content header (and thus also the number of content frames), this allows for a sender to abort the sending of content without the need to close the channel.

Instead any non Body frame after a Header frame closes the whole connection.

Reproduction steps

Send Basic.Publish
Send Basic.Header
Send Channel.Close

Expected behavior

Should close the channel without closing the whole connection.

Additional context

In AMQProxy we pool channels from many different downstream client on a upstream single connection, but if any of those downstream client disconnect while not have finished a full publish the proxy tries to close the upstream channel but then RabbitMQ closes the whole connection and thus all other channels from all other downstream clients.

michaelklishin commented 5 months ago

This is a highly hypothetical scenario that only a proxy can run into. Most RabbitMQ users do not use proxies of this kind, so we will only accept this if it introduces zero regressions for throughput (as reported by several long-ish PerfTest runs).

michaelklishin commented 5 months ago

https://github.com/rabbitmq/rabbitmq-server/pull/11210#issuecomment-2104979422

michaelklishin commented 5 months ago

Sorry, the more I think about how this can be done (#11210), the more I find section 4.2.6 to be ridiculous and this change as inevitably risky. We have deviated from the AMQP 0-9-1 spec in the past for practical implementation reasons and questionable or ambiguous decisions in that fronze-in-time spec, and I have no problem with that.

A Hypothetical Algorithm for Protocol-Aware Proxies

What a proxy could do to avoid an exception is something like this:

Knowing the size of the expected content (zero or more body frames), assume that the any frame arriving in the middle of a publish should be delayed
After the publish is done, publish all those pending frames

This will not allow clients to "terminate a publish" early but it will be a safe thing to do.

To handle potentially missing body frames, introduce a "flushing timeout" where all pending frames are sent. This would not be a safe thing to do, though, so such timeout would have to match the heartbeat or TCP keepalive timeout used.

It's not the most straighforward algorithm to implement but I honestly don't see why https://github.com/cloudamqp/amqproxy/issues/162 should be solved in RabbitMQ if the user very clearly states that directly connected clients are not affected.

rabbitmq / rabbitmq-server