ietf-tapswg / api-drafts

Architecture, interface, and implementation drafts for the definition of an abstract API for IETF TAPS
Other
23 stars 15 forks source link

7.3.7. Reliable Data Transfer (Message) redundant? #273

Closed mwelzl closed 5 years ago

mwelzl commented 5 years ago

Because we also have 7.3.1 Lifetime, with the statement "When a Message’s Lifetime is infinite, it must be transmitted reliably."

britram commented 5 years ago

This is IMO an open question -- "infinite lifetime" meaning reliable came from Post Sockets (and I'm, of course, partial to this way to do things), but I think this redundancy came from someone thinking that was a little too implicit.

mwelzl commented 5 years ago

It's redundant nevertheless. My suggestion is to leave the issue open for a bit, and at some later point, if nobody objects, remove "Reliable Data Transfer (Message)".

philsbln commented 5 years ago

It has a slightly different meaning (putting the 90% of the packets corner-cases aside):

The former is useful for optimising retransmissions in real-time scenarios, the latter is useful for application-level FEC/network coding.

mwelzl commented 5 years ago

Even after your explanation I don't understand how the latter of your cases differs from "Lifetime" set to infinite.

britram commented 5 years ago

On review, I don't think "Reliable Data Transfer" (Message) is, on its own, unambiguously implementable.

Post Socket's per-message PR parameters (Lifetime and Niceness) were drawn from how SCTP allows apps to set PR preferences.

RFC 3758 defines a "timed reliability" service, which is exactly what Message Lifetime is meant to expose. This makes sense, since PR is most useful in media applications where time limits are easy to determine.

RFC 7496 adds a "Limited Retransmissions" service, which allows the sender to set the max (but not min or average or required) RTX count. A trivial implementation of "Reliable Data Transfer (Message)" := False could be equivalent to setting this counter to 0. IIRC, this is in SCTP because I asked Michael Tüxen for it in 2006 on behalf of the IPFIX working group, but I don't think anyone ever implemented it for IPFIX. Nevertheless if we believe in it we could add it as a message property (which wouldn't be "reliable data transfer", but rather "max RTX count").

(7496 also specifies priorities, which is covered by Niceness).

mwelzl commented 5 years ago

Indeed, RFC 7496 has some rather obscure features; when I talked to Michael Tuexen about it, he said that it was implemented upon request from people who needed it - but these would be people with very specific SCTP needs, caring about the number of packets being sent under certain circumstances for telephony signalling. If you care at that level, you care about the specific protocol in use too.

At least the RTX count only makes sense for an application if it also knows some things about the environment (the RTT, at least) that it might not need to care about... and the app-measured RTT may not be quite the same as the transport-measured one. The latter is unavailable to the app but matters greatly for the RTX count. I think that a lifetime is enough for partial reliability that's meaningful to the application, and perhaps priorities, which indeed map to our per-message Niceness.

britram commented 5 years ago

What I'm saying is I'm the one who asked for RTX count, and I'm not sure it's useful. So I'm +1 to sticking with Lifetime, defaulting to infinite = fully reliable

mwelzl commented 5 years ago

Ah, ok, got it.

Sent from my iPhone

On 8 Jan 2019, at 17:36, Brian Trammell notifications@github.com wrote:

What I'm saying is I'm the one who asked for RTX count, and I'm not sure it's useful. So I'm +1 to sticking with Lifetime, defaulting to infinite = fully reliable

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

philsbln commented 5 years ago

@mwelzl: I see "do not retransmit" as a nice optimisation to prevent retransmission of stale date. Let's consider an application that does state replication and wants to use reliability for its handshake, but will take care of the individual message retransmissions itself to prevent retransmission of stale state. This is, e.g., possible with the proposed QUIC datagram extension. Lifetime is not the most useful option for the application, as it does not know in advance when a message will become outdated. It just wants no re-transmits. Besides that, I see no use-case for "re-transmit only 7 times".

mwelzl commented 5 years ago

A-ha! You're asking to support the semantics of unreliable message transfer. Good catch!

I was about to answer "just set lifetime to 0", but the Lifetime description (reasonably, IMO) says: "Lifetime specifies how long a particular Message can wait to be sent to the remote endpoint before it is irrelevant and no longer needs to be (re-)transmitted." Because it's not only about retransmission, we need a separate property to be able to say "don't drop it from your own send buffer, and don't retransmit it, but do send it once." It's this functionality you're after, right?

britram commented 5 years ago

yeah it'd be neat if "lifetime" could be three-state: "never RTX, RTX forever, TX/RTX before deadline" but I'm not sure I want to invent a new, easy-to-implement-wrong datatype for this.

So this is two properties, but I don't think the boolean is "reliable data transfer (message)". something like "retransmission desired"?

Actually, the semantics of that combine nicely:

The document should probably treat them as specially related though.

mwelzl commented 5 years ago

I like this proposal a lot. Indeed I think the document should then also list your four cases.

(just "until ACKed until expired" doesn't sound very good, but that's an easy fix)

abrunstrom commented 5 years ago

At the moment unrelaible msg transfer is covered by the Reliable Data Transfer (Message) property. So i think the question is if we need to distinguish

Is there a use case for that? The other two cases (reliable/unreliable) are covered.

gorryfair commented 5 years ago

Please do not create a " "don't drop a datagram from your own send buffer, "... that's a hideous invention. A transport that is reliable, will know what to do when a drop occurs and likely have a good idea of how to avoid self-congesting it's lower layers. A datagram app may or may not have this ability, but anyway has to be able to deal with loss and congestion.

I am really not in favour of adding more control of this - it's hard to implement and hard to get right. And... there is duplication in some networks, and there is retransmission in some networks as design artefacts or on purpose. So the app/transport still has to deal with this.

From what I recall, the sigtran work with SCTP added stuff to that transport to try and make it better for an already semi-reliable upper layer protocol - these tweaks are now in the Spec, although not commonly used outside that bespoke app. I'm not saying it was wrong, but maybe in retrosepec it was a mistake to put these extra switches in the general transport spec? (either way I do not agree that existance in the SCTP Spec as a sign that this is implenented or useful or even unambigious).

I suggest telling the transport/network to do "retx" or not is fundamentally wrong. The API should state something about the usage's general requirements ("Lifetime" is close to what I expected), and let the transport figure out what to do. I'd press for keeping the "property" simple and not trying to understand what the path and transport and OS stacks could do.

mwelzl commented 5 years ago

Hm. To summarize this discussion:

My suggestion is to conclude this discussion by leaving things as they are (but massage the text a tiny bit to stress the importance of having a separate "reliable data transfer" property - so that the confusion that I had doesn't happen for future implementers of a transport system).

philsbln commented 5 years ago

I also would like to add the warning, that setting "reliable transfer (message)" to "false" on a message is just an optimisation – it has the semantic of "the transport system may disable retransmissions or other reliability mechanisms for this particular message" – so if we have TCP beneath, it's a NO-OP.