moq-wg / moq-transport

draft-ietf-moq-transport
Other
82 stars 20 forks source link

How should a relay queue datagrams? #408

Open kixelated opened 7 months ago

kixelated commented 7 months ago

Does it ever make sense to queue a datagram before transmission, and if so for how long? How does this tie into object TTLs?

afrind commented 7 months ago

Individual Comment:

I'm inclined to decouple the mechanism by which an object arrived and how long it can be cached. Whatever we decide for TTL / expiry would also apply to datagrams, possibly with some wire format magic to reduce overhead.

kixelated commented 7 months ago

In my opinion, a relay SHOULD NOT cache or queue datagrams. Of course there can be some send buffer, but we're talking milliseconds.

The application wants to use datagrams because any form of transmission latency, for example retransmissions, is unacceptable. Timeliness is the property we want, not unreliability. A relay breaks that if it queues datagrams, creating an issue not dissimilar from bufferbloat.

What are example use cases of a datagram with a TTL greater than an RTT? If a datagram is allowed to be late, why not use a stream?

afrind commented 7 months ago

Individual Comment:

I don't have a particular use case in mind. Maybe the publisher chose datagrams because it didn't want to mess with retransmission, but I would still like the objects that do arrive at the relay be available to a subsequent subscribe? Generally I don't see a reason to normatively prohibit caching an object because it arrived on a stream vs a datagram. I'd rather the application have clear mechanisms to indicate how long something can be cached.

We haven't yet explained the normative rules for how relays make drop decisions. Since datagrams are inherently unreliable it might be tempting to say they are always dropped first, but again, I think I'd rather use send order/priority and expiry to drive that process primarily.

suhasHere commented 7 months ago

+1 on @afrind suggestions. Objects are cacheable entities in MOQ, not the transport mechanism

kixelated commented 7 months ago

The problem with caching/queuing datagrams is that there's an inherent mismatch between the IP/QUIC layer (TTL=0) and the MoQ layer (TTL=N). Trying to ask for a higher TTL than datagrams can inherently provide is going to result in strange behavior depending on the network/relay configuration.

It also doesn't make sense to have TTL>RTT, since retransmissions via streams are strictly better than using datagrams. I don't understand why you would ever want something transient at the network layer but persistent at the relay layer.

I'm fine with adding a TTL to datagrams but it needs to be measured in tens of milliseconds. Anything higher causes bufferbloat (relaybloat?) or should be handled with streams instead. But I would rather remove TTL from the datagram header and assume TTL=0, reducing the overhead in the process.

afrind commented 7 months ago

Individual Comment:

I think we should apply our energy figuring out what TTL/expiry means in general in moq first, then figure out how that might apply to datagrams.

The problem with caching/queuing datagrams is that there's an inherent mismatch between the IP/QUIC layer (TTL=0) and the MoQ layer (TTL=N)

I don't view this problem in the same way. The transport reliability (is it retransmitted?) and the durability of an object (how long is it valid) seem relatively orthogonal to me.

I'm fine with adding a TTL to datagrams but it needs to be measured in tens of milliseconds.

If TTL/expiry ends up being an object property, I wouldn't see any reason to normatively limit an application's ability to set the value to whatever they want. We can have a best-practices section that explains how to get good performance from an moq application.

suhasHere commented 7 months ago

The problem with caching/queuing datagrams is that there's an inherent mismatch between the IP/QUIC layer (TTL=0) and the MoQ layer (TTL=N). Trying to ask for a higher TTL than datagrams can inherently provide is going to result in strange behavior depending on the network/relay configuration.

I think this is wrong way to think about it .. TTL on moq applies to objects .. Objects arrive over datagram or over stream .. MOQ TTL has same meaniig for an MOQ object regardless of how it is delivered.

kixelated commented 7 months ago

I think we should apply our energy figuring out what TTL/expiry means in general in moq first, then figure out how that might apply to datagrams.

I'm having trouble adding datagram support and it seems even worse with TTL in the mix. The problem is that the relay has a single opportunity to send each datagram. How aggressive should it be?

For example, if a relay receive a datagram with no TTL. For all active subscribers, does it:

  1. Try to send it immediately, dropping it if the congestion window is full.
  2. Put the datagram in a priority queue, popping in send order until the congestion window is full.

The second approach means I drop fewer datagrams (especially for bursts), but it requires the QUIC library to have a callback-based API. But more importantly, it means that the relay will queue datagrams for an unbounded amount of time, aka bufferbloat. The entire point of using datagrams instead of streams is for timely delivery (below RTT) so I strongly disagree with this approach.

Now what if I receive a datagram with a TTL of 100ms? I'm basically required to implement the second approach, with the TTL being the maximum allowed amount of bufferbloat. I'm allowed to ignore the TTL and transmit/drop immediately, but then the field is useless and I'm ignoring application intent. And what is the application intent anyway? Why let the relay queue for up to 100ms but disallow retransmissions?

I just can't envision a use-case where datagram TTL would be useful and it's not trivial to implement either.

I think this is wrong way to think about it .. TTL on moq applies to objects .. Objects arrive over datagram or over stream .. MOQ TTL has same meaniig for an MOQ object regardless of how it is delivered.

I understand the desire to label everything as an "object", but the reality is that streams and datagrams have different properties. A TTL for a stream is completely different than a "TTL" for a datagram...

afrind commented 7 months ago

Commenting as an implementer:

Put the datagram in a priority queue, popping in send order until the congestion window is full.

...requires the QUIC library to have a callback-based API

The mvfst QUIC library maintains this priority queue without the need for callbacks. The application writes data into the transport library (on streams or datagrams) and the transport drains those queues in priority order as CWND is available.

A little bit of priority queuing is good for you -- eg: if you receive a burst of datagrams but the later received ones have higher priority. You don't want to fill up our CWND with low pri ones.

I also expect moq will give us reason to extend the capabilities of mvfst.

kixelated commented 7 months ago

Commenting as an implementer:

Put the datagram in a priority queue, popping in send order until the congestion window is full.

...requires the QUIC library to have a callback-based API

The mvfst QUIC library maintains this priority queue without the need for callbacks. The application writes data into the transport library (on streams or datagrams) and the transport drains those queues in priority order as CWND is available.

A little bit of priority queuing is good for you -- eg: if you receive a burst of datagrams but the later received ones have higher priority. You don't want to fill up our CWND with low pri ones.

I also expect moq will give us reason to extend the capabilities of mvfst.

Absolutely, you should have a small send buffer for datagrams especially for pacing.

The problem is that the size of the send buffer needs to be on a per-datagram basis to implement datagram TTLs. For example, there are two objects, one with TTL 10ms and the other with TTL 90ms. They both go into the same priority queue, but the library needs an API and mechanism to expire datagrams at different times. Most send buffers don't work that way.

Additionally, the QUIC libraries I've used don't have a way to prioritize individual datagrams. There's generally one queue with a small buffer because datagrams are intended to be sent immediately or dropped. A QUIC library needs to support both datagram prioritization and expiration for datagram TTLs to work; most libraries I've seen have neither.

MoQ could pave the way for new QUIC APIs but that's optimistic. In reality, I imagine most implementations will ignore the datagram TTL and it should be an extension at best.

afrind commented 7 months ago

Individual Comment:

I strongly believe we shouldn't constrain MoQ based on circa 2023 library APIs and designs. They are in user space and most are open source and have active developers. WebTransport isn't even constrained to RFC 9000 - it's going to depend on a QUIC extension which hasn't even been completed yet.

fluffy commented 6 months ago

Let me ask a question that is not about datagram. Say a relay receives an object from a stream on quic connection A and need to send it on a stream on quic connection B and also on a stream in a different connection C. It sends it on B, but on C the QUIC congestion controller does not allow it to be sent. At this point, it needs to buffer it until it can be sent. This buffering might happen in the QUIC stack or via back pressure of the quic stack telling the app to buffer it. But how does this work for objects sent over streams?

kixelated commented 6 months ago

Let me ask a question that is not about datagram. Say a relay receives an object from a stream on quic connection A and need to send it on a stream on quic connection B and also on a stream in a different connection C. It sends it on B, but on C the QUIC congestion controller does not allow it to be sent. At this point, it needs to buffer it until it can be sent. This buffering might happen in the QUIC stack or via back pressure of the quic stack telling the app to buffer it. But how does this work for objects sent over streams?

Yeah, the QUIC stack could queue some stream data (ex. TCP send buffer) until it eventually it hits a limit (ex. TCP write can return EAGAIN).

The important part is how the QUIC stack drains the queue or otherwise unblocks streams. That's the entire purpose of prioritization, with send_order being the signal to the QUIC stack on which streams to unblock first (in strict order).

So if you write two objects via streams, A and B, they might get queued during congestion. If B has a higher priority (lower send_order) then it will get sent first after recovery, even if A was written first. And if newly created C enters the picture with a higher priority, then it will pre-empt both A and B.

All of the objects/streams stay queued in the application/QUIC stack awaiting their turn via send_order. The draft currently doesn't specify how streams/groups/objects are dropped, causing wasted bandwidth. I've got a few ideas, including using SUBSCRIBE_UPDATE with a new start group/object to signal that an old group is no longer desired.

kixelated commented 6 months ago

A practical example:

Each audio frame is sent via a separate stream with descending send_order. That means new audio frames are transmitted first, (re)transmitting old audio as the congestion window allows. If an audio frame is not received in time for the jitter buffer, it is not rendered.

Note that the sender does NOT know the size of the jitter buffer. Using datagrams assumes the jitter buffer is too small for retransmissions (<1.5x RTT), but that's often not true especially as relays are introduced.

The old audio frames are still queued for transmission, although they are lower priority than newly encoded frames and will not compete for bandwidth. The subscriber is somehow responsible for somehow signaling that they no longer want this data based on their (current) jitter buffer. Higher latency playback will allow seconds, while lower latency playback will allow milliseconds.

kixelated commented 6 months ago

One final addition to the wall of text:

We need to add some sort of TTL too, which is the maximum amount of time the publisher should have a stream cached/queued. The subscriber should have a way to lower that based on their jitter buffer size.

vasilvv commented 6 months ago

Note that we've tried no-queueing for datagrams before, and it doesn't really work due to pacing. WebTransport supports time limit on how long a datagram can be in the queue, and I believe W3C WG is open to alternative suggestions (with the constraint that we can't do callbacks due to IPC nature of the sandbox).

fluffy commented 6 months ago

I think objects should queue the same way regardless of if they are on datagrapms or streams. In both cases there needs to be some sort of time to live to stop infinite growth and there needs to be priority/send order algorithm that determines what object gets sent next when the congestion window opens up.

huitema commented 3 months ago

Stacks can send multiple QUIC datagrams in a QUIC packet. Take the case of an application sending 20 bytes audio datagrams every 20 ms. If the link is congested, several datagrams will be queued. They will probably all be shipped in a single QUIC packet, unless the stack is specifically programmed to not send more than one datagram per packet.

QUIC never states the order in which QUIC frames received in the same packet shall be processed. Many stacks do left to right, but right to left or any other order is fine. If datagrams are sent on the same packet, their delivery order is unspecified. This makes applications of "send order" a bit problematic...