panrg / path-properties

A Vocabulary of Path Properties
Other
1 stars 3 forks source link

Formalize definition of a path #1

Closed renghardt closed 4 years ago

renghardt commented 5 years ago

We need to provide (at least) one formal definition of what a path is. Is it defined on a flow level, i.e., the hops that a set of packets traverses? Or only the hops that one individual packet traverses?

For this, we should look at the definitions in:

renghardt commented 5 years ago

IPPM, in RFC 2330 (Framework for IP Performance Metrics), defines a path as a sequence of links and routers, with two hosts at the end. Each pair of (link, host) is a 'hop'. So it is a unidirectional concept which does not depend on a packet, but a packet can take one specific path. This is probably a good starting point for our definition...

cyrill-k commented 5 years ago

I wrote this definition of a path at IETF 103 without looking at existing definitions in other IETF documents, but it is similar to the RFC2330 definition. This definition is at the flow level, but it might be more clear to define a path for individual packets.

A "path" is defined as the set of the devices and links between two endpoints that can be traversed by a set of packets at specific points in time. Depending on which layer the "path" is looked at, devices, links, and their properties might differ. Devices may be hidden, and a set of links may be abstracted into a single link. The lifetime of a "path" overlaps with the lifetime of the corresponding connection but is not restricted to it.

A "path" can be classified by timescale into a "measured path", based on concrete previous and current measurements, and a "potential path", which is a path with predicted characteristics, possibly including the reliability of such predictions.

renghardt commented 5 years ago

Yes, I think it makes sense to first define the "path" on a per-packet level. Then as a second step, we can define multiple packets belonging to one flow (not just connection, but multiple UDP packets with the same source and destination IP address and port as well). Then we can have packes of one flow taking the same path, and then we can have path properties change or stay the same for multiple packets within one flow. So one packet can have a specific one-way latency along the path, and multiple packets of the same flow on the same path can have, e.g., a median one-way latency, or a Round Trip Time.

I'm not yet sure if we need a distinction into "measured" and "potential" path -- aren't these more, like, measured properties or predicted properties? I mean, the path is always there, no matter if packets are sent over it, right? But some properties only exist in relation to packets sent over a path?

cyrill-k commented 5 years ago

What do you think about this terminology section? Should we add it before the introduction? (I guess if we add it after, we need to remove any reference to path/property in the introduction and make a more high level introduction) I tried to structure it accordingly (path -> flow -> property). Maybe it is better not to define Connection, as it may cause confusion between connections and flows. I think we should clearly specify that a path can be defined over different layers and that we only consider end-to-end connections/flows/paths, but I could also add this to the path definition and remove the connection definition.

Terminology

This section defines a set of terms used throughout this document. In some cases these terms have been used in other contexts with different meanings so this section attempts to describe each term's meaning with respect to the PANRG activities.

Connection: : Connections, as used in this draft, incorporate application, transport, network, data link, and physical layer connections between a set of endpoints. Hereinafter, connections refer to bidirectional or unidirectional connections between two endpoints. How concepts of paths and path properties can be extended to other connection models such as one-to-many or many-to-many connections is not described in this draft.

Path: : A path element is a device (including the endpoints), or link used to connect two endpoints and transmit information. A path is defined as an ordered set of path elements that can be traversed by a packet at specific points in time. For the sake of simplicity, we use the term packet, typically used at the network layer, to describe bit strings transmitted on any layer. Depending on which layer the path is looked at, devices, links, and their properties might differ. Devices may be hidden, and a set of links may be abstracted into a single link. The lifetime of a path overlaps with the lifetime of the corresponding connection but is not restricted to it.

Flow: : Several packets traversing the same path elements at specific points in time, can be combined into a flow (e.g., all packets sent within a UDP session). As a special case, a flow can consist of just one packet.

Property: : A property describes a trait of a set of path elements (e.g., capacity of a link, is device X a firewall, one-way bandwidth which is the minimum of all link bandwidths), or a trait of a flow being sent on a set of path elements (e.g., RTT, one-way delay). A property is thus described by a tuple containing the ordered set of path elements, the set of packets traversing the path (the flow) or an empty set if no packets are relevant for the property, the type of trait (e.g., bandwidth), and the value of the trait (e.g., 100mbps).

Aggregated Property: : A property can be aggregated over a set of path elements (e.g., loss rate in network backbone calculated using the individual link loss rates), or over a set of packets (e.g., median one-way latency of all packets during the last second), or over both (e.g., average time spent in buffers in the network backbone). Aggregation can be numerical (average, sum, min, ...), logical (true if all are true, true if at least X are true, ...), or an arbitrary function which maps a set of input properties to an output property.

Measured & Potential Property: : A property can be classified by timescale into a measured property, based on concrete previous and current measurements, and a potential property, which is a property with predicted characteristics, possibly including the reliability of such predictions.

renghardt commented 5 years ago

Thanks for proposing this.

Could you make a Pull Request out of this, please? That would make it easier to discuss individual parts of it by referring to specific lines etc., and then it's easier to merge it.

First, about where to put it: I think it's fine to put it after the Introduction and still use the words in the introduction. RFC 2330 does it, for example, so we are allowed, too. ;)

About the definitions:

I agree that perhaps it's better to not define Connection. Here it is basically defined by itself ("a Connection is a connection"), but I mean, what even is a physical layer Connection for example? Is it a single frame sent from one network card to another on a single physical link? This confuses me.

I think we should start with Path, or maybe with path element, and then we can define Path with it (i.e., instead of "Devices may be hidden", write "Path elements may be hidden", and instead of "links can be aggregated", "Path element can be aggregated". And then I thought that the lifetime of a path is just the lifetime of a packet? Or do you mean that the "physical layer connection" is just one single packet? Again, I'm confused.

In fact I'm still wondering why we need the different layers at all as they are used here. Why not just define path as a set of path elements at the network layer, to be traversed from one endpoint to the other? Then physical layer properties can apply to an individual path element, and transport layer properties can apply to the entire path. And then we can have aggregate properties that apply to a Flow.

Does the definition of flow imply that all packets have the same 5-tuple? This would be the case for a single UDP session, but then the packets of this session can easily take different paths through the network. Maybe it's better to define flow with the 5-tuple, and then say, if multiple packets of the same flow take the same path, we can aggregate their properties? Or do we here completely redefine flow and the example is just incomplete, i.e., the example should be "all packets sent within a UDP session which traverse the same path elements"?

I like the concept of Property and Aggregated Property. Not sure if we really want to call it "type" of the property, because this sounds a lot like "data type" to me, maybe it's just the name of the property?

Minor point: I think the sentence at the start of the section, "In some cases these terms have been used in other contexts with different meanings...", is redundant, because it's obvious. Or, if we must have it, let's not say that we define this for "the PANRG activities", but only for this draft.

cyrill-k commented 5 years ago

I created a pull request #5

I removed connection and added a separate definition for path element, which can be a device or link on ANY layer. Then the path is a definition of path elements on the network layer.

I don't think its useful to define a flow on a 5-tuple since as you said we cannot say anything about the properties of such a flow. I adjusted the example.

I also think using name instead of type makes it more clear what we mean.

What do you think about measured & potential property? Is it necessary to define it?

renghardt commented 5 years ago

Closed via #5.

renghardt commented 5 years ago

Some thoughts on how to revise our definitions:

renghardt commented 5 years ago

Now that we've converged on the terms for this revision, I did a first search-and-replace on the old terms in the other sections, see #12. Feel free to add more.

renghardt commented 5 years ago

Closed pre-IETF 105, let's see what PANRG thinks about our updated definitions.

renghardt commented 5 years ago

Next round including comments from IETF 105 PANRG session:

renghardt commented 5 years ago

Additionally, replying to Med's comments (which I will put into an e-mail once I'm through):

On "Host": [Med19]: I personally prefer the definition in RFC1122 ["Host: ultimate consumer of communication services. A host generally executes application programs on behalf of user(s), employing network and/or Internet communication services in support of this function."] Thanks for the suggestion, I really like this definition as well. The distinction based on "processes packets that are addressed/not addressed to [a node]" has its problems, as this does not say on which layer we consider the addresses, this gets more complicated with encapsulation, etc. So I'm in favor of changing our definition of a host to something along the lines of RFC 1122. [Med20]: This is about receiving. You may also cover sending We intend "processes packets" to include both sending and receiving, but I suggest we switch to a different definition as discussed above.

On "Path element: Either a node or a link." [Med21]: This definition does cover, for example, the case of a path identified by an ordered list of AS Number. Check for example the definition of ERO in RFC3209, or even in recent documents such as RFC7570 True. ERO can include groups of nodes, possibly this can mean entire ASes. We may want to allow "aggregated" path elements that are entire networks, captured by the network's AS number. Is it sufficient to express these as subpaths, e.g., consisting of all nodes and links within this AS? Maybe not, because we want this "aggregated" path element (group of nodes and links? cloud?) to be amorphous. As we may not have full visibility of all path elements in practice, most paths probably include at least one such "aggregated" path element anyway.

On "Path: […] alternating between nodes and links": [Med22]: This definition excludes recent notion such as SFP defined in RFC7665 Just to make sure I got this right: Defining an individual Service Function (SF) or Service Function Chain (SFC) to be a node (which may be virtual) between which links may exist (either physical or virtual, maybe allowing virtual links to not just be between virtual network interfaces but even within a single host) does not solve the problem? And then the Service Function Path (SFP) can be a sequence of nodes and links? Do you think this is this necessarily just a sequence of nodes, no links, because we would have to stretch the definition of a virtual link too far, or is there a different reason? On "A path can be viewed as an abstraction on a specific layer, omitting lower layer path elements." [Med23]: Contradicts with the sentence about alternating nodes/links. Why? If we define a path on, e.g., layer 3, we can treat the link between two layer 3 devices as a virtual link, so we abstract from all physical links and nodes on layer 2. Do you have an alternative proposal how to phrase this?

On "Property: A trait of one or a sequence of path elements […]" [Med24]: I’d like to double check If the defining covers also the concept of SFC/SFP. The path will be built to accommodate a property (result of SFC). I think this depends on how we define the sequence of path elements, see above, to make sure we cover these concepts.

On "Aggregated property: A collection of multiple values of a property into a single value, according to a function." [Med25]: Note this cannot be applied to all properties of a connectivity service. Correct. For all properties of a connectivity service, you would need a set of all aggregated properties.

On "Measured property: A property that is observed for a specific path element or path, e.g., using measurements." [Med26]: I prefer « Observed » rather that measured because properties may not only be about traffic performance metrics. True, we define lots of other properties for which we can observe a value. So we may change this.

On "Estimated property: An approximate calculation [CROSSED: or judgment] of the value of a property." [Med27]: As above, this is too measurement-centric. This may not be applicable to all properties. The intention of the "or judgment" is to express that this may not be a numeric and/or performance-related property. I understand "estimate" to be broader than numerical but rather to include "an educated guess". However, it seems to me like most words in this space imply numeric values. So I'm looking for the right word here - does "An approximate calculation or assessment of a numeric or non-numeric value of a property" work? Any other suggestions?

renghardt commented 4 years ago

Replies regarding the comments:

Med really wants us to avoid the "alternating between nodes and links": "[Med] I disagree these are "nonsensical paths". Defining a path as a sequence of nodes/functions may be sufficient for an endhost. For example, a collaborative network may decide to announce a set of functions to an endhost; those functions can be invoked individually or in a sequence. The endhost may impose a "path" that is defined (from its local standpoint) as a singleton function or a loose path defined as an ordered list of functions. Also, a MIF device can define a path with reference to its local interfaces (without even caring about further hops upstream).

[TE] Agreed - as long as it is possible to pass packets across the path, it's not "nonsensical". However, even if a node only has a partial view of the path, such as a set of functions or a reference to a local interface, there is still at least one other node (the remote endpoint) that the node most likely cares about. Then, if the node does not know or care about the path elements inbetween, we might want to model them as an "aggregate" path element or something like this."

Further, regarding service functions: "[Med] as I mentioned above, a same path may be viewed differently by distinct nodes. An endhost, for example, does not have to manipulate the same path granularity as a router or a Service Function Forwarder (RFC7665). Also, defining a path as a sequence of nodes/links may not be sufficient in some case as the overall path must also invoke a set of functions that may be collocated within a given node.

[TE] But functions can also be nodes, right, and then a sequence of functions (nodes) can be located within a host (also a node)... I'm wondering if this will get confusing."

Regarding views of the path: "[Med] In short, a path can be zoomed in/out as a function of the context.

[TE] I think what you call "zoom in/out" is similar to what we say in our current path definition, e.g., "If [a] router does not implement transport layer functionality, it is hidden when a higher layer, such as the transport or application layer, is considered." - a host might not have to care about every individual router on the path. We'll see if we can make this clearer and/or more explicit."

Regarding strict/loose paths: "[Med] Moreover, a path does not need to be strictly defined; loose mode should also be supported.

[TE] What is the difference between strict and loose here - full view vs. partial view of the physical topology?"

Regarding defining paths between two hosts/nodes: (comment from abstract, but I think we should also keep this in mind for our definitions) "[Med] This will depend on the nature of a communication. I'm not sure this will apply for SSM, ASM or when anycast is used.

[TE] Neither am I. I'll think about this more."

To [Med19] and [Med20], he agrees with my points.

To [Med23]: "[Med] I would simply remove that sentence as I don't see how (assuming it is true) it can be useful.

[TE] Perhaps this is related to our discussion about views of the path. We'll think about this more."

To [Med26] he says "OK".

To [Med27]: "[Med] "approximate assessment" would be OK."

renghardt commented 4 years ago

Closed via #21