ietf-wg-idr / draft-ietf-idr-5g-edge-service-metadata

Editing for the 5G Service Metadata
0 stars 2 forks source link

Jeff Haas original comments on Oc 27, 2023 #3

Open lindadunbar opened 9 months ago

lindadunbar commented 9 months ago

-----Original Message----- From: Jeffrey Haas jhaas@pfrc.org Sent: Friday, October 27, 2023 4:01 PM To: draft-ietf-idr-5g-edge-service-metadata@ietf.org; idr@ietf.org Subject: Comments on draft-ietf-idr-5g-edge-service-metadata

Authors,

My apologies for not providing earlier review of your document, especially in light of request for early allocation. Life happens, and life has been messy the last few months.

I have several concerns covering the PDU formats and BGP procedures for this document, but most of them are not severe and likely to be able to be addressed in an update. Depending on how some are addressed, the implementors of the draft may want to defer their code point request until they can do further discussion and testing of the result of these conversations.

High level assessment for the chairs and AD:

Asking for early assignment for a path attribute code point has a requirement for a certain level of stability for the feature. This is primarily motivated by inconsistent parsing of the attribute, or inconsistent feature behavior, can result in anything from BGP session resets to traffic blackholes.

While the intent of this feature appears to be "walled garden" scenarios, the fact that BGP is used in these environments in circumstances that can leak on accident, these issues should be discussed.

The majority of the PDU considerations are covered, but not all. It'd be good to get some clarity on the items flagged below as they are likely to impact the ability to have interoperable implementations.

I find the site-id procedures to be unclear and some additional text might be helpful.

The potential inconsistencies in route selection and the partial deployment model MUST be addressed. I suspect the intended deployment scenario is such that this may not be a problem when the feature is kept contained in a consistently deployed walled garden.

The feature does note that filtering of the attribute may be necessary. However, we're getting very publicly reminded that path attributes often go further than we like, and cause outages and security issues. I'd strongly recommend the attribute escape scenario be dealt with prior to early assignment.

Boring BGP details:

Transitivity:

The transitivity of thie new path attribute isn't currently clear in the document. Here are some conflicting sections:

: 4.1. Metadata Path Attribute : : The Metadata Path Attribute is an optional transitive BGP Path attribute to : carry metrics and metadata about the edge services attached to the egress : router.

Here, we're saying the attribute is optional, transitive.

: 4.1.1. Metadata Path Attribute Handling Procedure : [...] : : When a BGP Speaker does not recognize some of the Sub-TLVs within one : Metadata Path Attribute in a BGP UPDATE message, the BGP Speaker should : forward the received BGP UPDATE message without any change if the BGP UPDATE : message is marked as transitive.

BGP UPDATEs themselves aren't "transitive". What's the intention here? Simply that if the BGP route is propagated that unrecognized TLVs should be propagated?

Perhaps consider using the RFC 4271 normative definition of Route:

: Route : A unit of information that pairs a set of destinations with the : attributes of a path to those destinations. The set of : destinations are systems whose IP addresses are contained in one : IP address prefix carried in the Network Layer Reachability : Information (NLRI) field of an UPDATE message.

This avoids trying to discuss the UPDATE. UPDATEs are just PDU packing to carry Routes and covers the destination and the Path Attributes.

: 4.1.2. TLV Format : [...] : : The second high-order bit (bit 1): set to 0 to indicate that the : service-metadata is not transitive. Only intended for the receiving router.

Here, it's marked as non-transitive.

Other PDU details:

: 4.1.2. TLV Format : [...] : : - The third high-order bit (bit 2): same as specified by RFC4721. : - The fourth high-order bit (bit 3): set to 1 to indicate there are two : octets for the Length field.

Don't try to describe the details here. It's base BGP protocol behavior. In particular, the text covering the extended length behavior is erroneous and had been mentioned previously in another thread. The Length will be one or two octets based on the length of the contained attribute value.

: 4.3. Capacity Availability Index Metadata : : The Capacity Availability Index Sub-TLV: : : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | CapAvailIdx Sub-Type | Reserved | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Site-ID (2 octets) | Site Availability Percentage | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where's the Length? Is Reserved a typo?

Note that the site-id and site availability percentage breaks your "all sub-tlv values are 32 bit integers". (Noted below commenting on §4.1.2)

: 4.4.3. Raw Load Measurement Sub-TLV : [...] : Raw Load Measurement Sub-TLV has the following format: : : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Raw-Load-Measurement Sub-Type | Length | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Measurement Period | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of packets to the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of packets from the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of bytes to the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of bytes from the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This also breaks the "all sub-tlv values are 32 bit integers" rule.

Additionally, these values likely want to be 64-bits wide. Given network capacities and speeds these days 32 bits is likely too small.

: - Measurement Period: BGP Update period in Seconds or user-specified period.

Bad idea. Either specify seconds or provide an additional field so that that measurement quantum can be determined by the applications or management platforms.

Error Handling:

: 4.1.3. Error Handling : : A BGP speaker SHOULD NOT include more than one Metadata Path Attribute in : one BGP Update message.

Largely shouldn't be mentioned. It's a session fatal error per RFC 4271, §6.3.

: A BGP UPDATE message that includes the Metadata Path Attribute doesn't : change the BGP Error Handling procedure specified in the [RFC7606].

Similarly, don't mention. You don't get to break these rules. :-)

: If one of the Sub-TLVs has an invalid value, e.g., out of its specified : ranges, the Sub-TLV with the invalid value is ignored by the BGP receiver.

Ignored is fine. Should the TLV be stripped prior to propagation? Unlike unknown TLVs, a known TLV with an invalid value could be cleaned up.

: By default, no notification is required unless configured to send a : notification to its management system.

I don't think you mean to use "notification" here since that word tends to be used for the NOTIFICATION PDU as part of a session reset. Is your intent that logging of the error locally or to a management system is optional rather than a SHOULD?

: 8. Validation and Error Handling

I'd suggest condensing the 4.1.3 and 8. error handling sections together.

General sub-TLV questions/issues:

: 4.1.2. TLV Format : : All values in the Sub-TLVs are unsigned 32 bits integers.

Are you sure you really want to force these values to be constrained to 32 bits? This may preclude adding other types with more flexible values in the future, even if something boring like uint64 or decimal64.

Minimally, you should say if these are signed or unsigned integers. Based on usage throughout the rest of the document, I suspect they're intended to be unsigned.

Having decided what the required - fixed-sized - length is, the error handling considerations should be addressed. The section 4.1.3 text covers invalid (semantic) values, but doesn't address syntactic issues with this field. If the length is not correct, is only the sub-TLV itself considered malformed or is the entire Path Attribute malformed? If the latter, the usual RFC 7606 treat-as-withdraw behaviors should be done.

Speaking of Length, the document should describe the semantics of the sub-TLV length field and whether it is only the length of the value portion of the field or inclusive of the type and length field itself. This occasionally varies in IETF protocols and can lead to interop issues.

Route Churn Considerations:

Several fields contain values that are intended to be some level of dynamic metric. (Zero surprise given that's the purpose of the feature!) This includes the Capacity Availability Index, Site Delay Prediction Index, Service Delay Prediction Index, Raw Load Measurement, etc.

Section 7 at least addresses that metric change can impact path selection, and attempts to provide a default lower bound for such churn. Good!

However, since this mechanism is intended to be able to be used on routes that are used for BGP nexthop resolution (e.g., labeled unicast), the churn in these metrics can result in not only churn of the prefixes carrying the data, but dependent routes.

This churn is highly analogous to the impacts of features such as RSVP auto-bandwidth which is known to have significant negative network impacts. It's minimally responsible to mention this broader impact.

4.3. Capacity Availability Index Metadata Issues:

Section 4.3.2 discussses that this TLV is used to help decorate routes that have a nexthop where the routes share a site-id. However it's unclear if this TLV is intended to be used BOTH on the routes used as nexthops and the routes that resolve over said nexthops as the correlator?

Can routes used as nexthops have more than one site-id bound to them? Can the routes resolving over them?

If the same TLV is used on the routes resolving over the nexthops, how is the site availability percentage filled in?

The following text is also unclear: : However, it is unnecessary to include the Site Capacity Availability Index : for every BGP Update message if there is no change to the site-reference : identifier or the Capacity Availability value for the service instances.

BGP uses implicit withdraws when UPDATES advertise a given NLRI with a set of Path Attributes. If the Path Attributes contain metadata that excludes the capacity TLV, I'd presume that it goes away from the route in question. What is the intent of the text above?

Route Selection Considerations:

: 5.2. Integrating with BGP decision process

: For the selected services configured to be influenced by the Edge Service : Metadata, the ingress router BGP Decision process [IDR-CUSTOM-DECISION]

While the custom decision process draft is adopted work, it's not widely implemented.

If the proposal is using the cost community, what is the recommended contents of that community for these procedures?

Inconsistent Route Selection Considerations:

Devices that recognize TLVs inconsistently may have inconsistent route selection. This should be flagged as an issue.

Features that impact BGP's route selection need significant additional scrutiny, and IDR review hasn't always been great about catching such things. The primary issue is that when such features are inconsistently deployed, or their inputs are inconsistently made use of (see comment above), BGP Speakers in the network can come to different conclusions about what the active route should be.

This can result in forwarding loops.

Features that use optional non-transitive path attributes can be more safely deployed in a network, but still require the feature to be consistently deployed within an IGP domain in many cases. This document is currently unclear about the scope of deployment, but with the focus on the ingresses, it has the feel that the intent that partial deployment is a consideration.

The text in section 6 of your document seems to confirm that partial deployment is under consideration.

See general considerations in draft-haas-idr-bgp-attribute-escape, §3 for some discussion on these points.

Note that tunneling to the nexthop can mitigate some of the forwarding loop considerations in some cases, but not all.

Scoping Considersations:

RTC procedures:

: 6. Service Metadata Propagation Scope : : For each registered low-latency Service, BGP RT Constrained Distribution : [RFC4684] can be used to form the Group interested in the Service. The : "Service ID", an IP address prefix, is the Route Target.

Is the intention here that a general-purpose IP-formatted route-target should contain the service ID?

This seems to be limiting the feature's use to only VPN service routes using this specific format. If that's not the case and this is intended as a way to mark subsets of routes using this feature for constrained distribution, you probably want to use a new extended community type/subtype.

Note that rt-constrain on non-route-target extended communities has been discussed previously on IDR, and some drafts trying to codify the same, but support for the same is currently not deployed to the best of my knowledge.

The procedures covering exactly what happens for advertising the RTC route and attracting the group routes is unclear. See my other questions about site-id in my comments above.

An example here would be quite helpful.

Attribute Escape Considerations:

This draft specifies a new path attribute, that may be optionally transitive, or not. (The need for clarification is in prior comments.)

This new attribute may be attached to Internet scoped routes.

Section 9 and Section 10 attempt to limit the deployment of this feature within "trusted domains" "between Ingress and egress routers of one single BGP domain".

Section 4.1.1 attempts to address the scoping consideration further by: : In order to prevent distribution of the BGP Metadata Path Attribute beyond : its intended scope of applicability, attribute filtering SHOULD be deployed : to remove the BGP Metadata Path attribute at the administrative boundary.

As addressed in draft-haas-idr-bgp-attribute-escape, such filtering desires and expectations of limited domains have tended to be wishful thinking and we keep ending up with operational accidents.

I would STRONGLY suggest that the Path Attribute definition be updated to provide additional scoping information wherein a remote BGP domain receiving an escaped metadata Path Attribute can determine that it should NOT be locally used for the procedures discussed in this document.

An example of such a change would be to add an Autonomous System number to the Path Attribute providing the context of the AS that should be using the contained information.

-- Jeff

lindadunbar commented 9 months ago

From: Linda Dunbar Sent: Sunday, October 29, 2023 4:45 PM To: Jeffrey Haas jhaas@pfrc.org; draft-ietf-idr-5g-edge-service-metadata@ietf.org; idr@ietf.org Subject: RE: Comments on draft-ietf-idr-5g-edge-service-metadata

Jeff,

Thank you very much for the detailed review and the comments. Please see below for the detailed resolutions.
A couple of them I would need to chat with you during IETF118 for the exact changes.

Thank you very much!

Linda

-----Original Message----- From: Jeffrey Haas [jhaas@pfrc.org](mailto:jhaas@pfrc.org) Sent: Friday, October 27, 2023 4:01 PM To: draft-ietf-idr-5g-edge-service-metadata@ietf.org; idr@ietf.org Subject: Comments on draft-ietf-idr-5g-edge-service-metadata

Authors,

My apologies for not providing earlier review of your document, especially in light of request for early allocation. Life happens, and life has been messy the last few months.

I have several concerns covering the PDU formats and BGP procedures for this document, but most of them are not severe and likely to be able to be addressed in an update. Depending on how some are addressed, the implementors of the draft may want to defer their code point request until they can do further discussion and testing of the result of these conversations.

High level assessment for the chairs and AD:

Asking for early assignment for a path attribute code point has a requirement for a certain level of stability for the feature. This is primarily motivated by inconsistent parsing of the attribute, or inconsistent feature behavior, can result in anything from BGP session resets to traffic blackholes.

While the intent of this feature appears to be "walled garden" scenarios, the fact that BGP is used in these environments in circumstances that can leak on accident, these issues should be discussed.

[Linda] The Security Consideration of the Version -11 has the following statement: “The ingress routers should not propagate the Edge Service Metadata to any nodes that are not within the trusted domain.”

Should we expand this statement with the following ? BGP Route Filtering or BGP Route Policies [RFC 7454] can be used to ensure that BGP update messages with Metadata Path Attribute attached do not get forwarded out of the administrative domain. BGP route filtering [RFC 7454] allows network administrators to control the advertisements and acceptance of BGP routes, ensuring that certain routes do not leak outside of the intended administrative domain. Here are the steps to achieve this:

Use Route Filtering: Implement route filtering policies on the ingress routers to restrict the propagation of BGP update messages for the registered 5G edge services beyond the administrative domain. You can use access control lists (ACLs), prefix lists, or route maps to filter the BGP routes classified as the 5G edge services, which need the Metadata Path Attributes to be distributed from egress routers to ingress routers.

Filter by Prefix: Use prefix filtering to specify which IP prefixes should be advertised to peers and which should be suppressed. This step ensures that only authorized routes are sent to external peers.

Use Route Maps: Route maps provide a flexible way to filter and manipulate BGP route advertisements. You can create route maps to match specific conditions and then apply them to the BGP configuration.

The majority of the PDU considerations are covered, but not all. It'd be good to get some clarity on the items flagged below as they are likely to impact the ability to have interoperable implementations.

I find the site-id procedures to be unclear and some additional text might be helpful.

[Linda] the Site-ID in this document is an identifier for a group of routes associated with a common physical characteristic, for example a pod, a row of server racks, a floor, or an entire DC. The purpose is to use one UPDATE message to indicate a group of routes being impacted by a physical event. Those routes might be from different address families or NLRI.

The Version 11 is specified as the following: “Identifier for a site, which can be a pod, a row of server racks, a floor, or an entire DC. There could be multiple sites connected to the egress router (a.k.a. Edge DC GW).”

Is it clearer to change to the following? “Site ID is an identifier for a group of routes associated with a common physical characteristic, for example, a pod, a row of server racks, a floor, or an entire DC. The purpose is to use one UPDATE message to indicate a group of routes impacted by a physical event. Those routes might be from different address families or NLRIs. There could be multiple sites connected to one egress router (a.k.a. Edge DC GW).”

The potential inconsistencies in route selection and the partial deployment model MUST be addressed. I suspect the intended deployment scenario is such that this may not be a problem when the feature is kept contained in a consistently deployed walled garden.

[Linda] Do you mean when some nodes don’t support Metadata Path Attribute? is it appropriate to add a statement in Section 3? The goal of this Edge Service Metadata Path Attribute is for egress routers to propagate the metrics about their running environment for a subset of Edge Services to ingress routers so that the ingress routers can make path selections based on not only the routing cost but also the running environment for those edge services. The BGP speakers that don’t support the Metadata Path Attribute can ignore the Metadata Path Attribute in a BGP UPDATE Message. All intermediate nodes can forward the entire BGP UPDATE as it is.

The feature does note that filtering of the attribute may be necessary. However, we're getting very publicly reminded that path attributes often go further than we like, and cause outages and security issues. I'd strongly recommend the attribute escape scenario be dealt with prior to early assignment. [Linda] What do you mean by “the attribute escape scenario”?

Boring BGP details:

Transitivity:

The transitivity of thie new path attribute isn't currently clear in the document. Here are some conflicting sections:

: 4.1. Metadata Path Attribute : : The Metadata Path Attribute is an optional transitive BGP Path attribute to : carry metrics and metadata about the edge services attached to the egress : router.

Here, we're saying the attribute is optional, transitive.

[Linda] In our current implementation, the Metadata Path Attribute is “Transitive” only on the RR, but NOT forwarded by the ingress nodes. Majority of the routes’ UPDATE don’t have Metadata Path Attribute. The Metadata Path Attribute is only included for a small set of prefixes. Can you suggest a better term to describe this?

: 4.1.1. Metadata Path Attribute Handling Procedure : [...] : : When a BGP Speaker does not recognize some of the Sub-TLVs within one : Metadata Path Attribute in a BGP UPDATE message, the BGP Speaker should : forward the received BGP UPDATE message without any change if the BGP UPDATE : message is marked as transitive.

BGP UPDATEs themselves aren't "transitive". What's the intention here? Simply that if the BGP route is propagated that unrecognized TLVs should be propagated? [Linda] RFC 4271 says that the Transitive (bit 4 set to 1) of the BGP Attribute Header: If the transitive bit is set to 1, the attribute is transitive. This means that the attribute must be passed along to other BGP routers in the path. I meant to say that if the Transitive bit is set to 1, the BGP speaker should forward the received BGP UPDATE unless the BGP Speaker is an AS boundary node.

Perhaps consider using the RFC 4271 normative definition of Route:

: Route : A unit of information that pairs a set of destinations with the : attributes of a path to those destinations. The set of : destinations are systems whose IP addresses are contained in one : IP address prefix carried in the Network Layer Reachability : Information (NLRI) field of an UPDATE message.

This avoids trying to discuss the UPDATE. UPDATEs are just PDU packing to carry Routes and covers the destination and the Path Attributes.

[Linda] That sentence is very confusing to me. The Metadata Path Attribute is listed in parallel to NLRI. How about changing the text to the following: “When a BGP Speaker does not recognize some of the Sub-TLVs within one Metadata Path Attribute in a BGP UPDATE message, the BGP Speaker should forward the received BGP UPDATE message without any change if the transitive bit is set to 1 [RFC4271].”

: 4.1.2. TLV Format : [...] : : The second high-order bit (bit 1): set to 0 to indicate that the : service-metadata is not transitive. Only intended for the receiving router.

Here, it's marked as non-transitive.

[Linda] Is it Okay? Any change is needed?

Other PDU details:

: 4.1.2. TLV Format : [...] : : - The third high-order bit (bit 2): same as specified by RFC4721. : - The fourth high-order bit (bit 3): set to 1 to indicate there are two : octets for the Length field.

Don't try to describe the details here. It's base BGP protocol behavior. In particular, the text covering the extended length behavior is erroneous and had been mentioned previously in another thread. The Length will be one or two octets based on the length of the contained attribute value.

[Linda] Should those two bullets be deleted? Possible to sit down with you in IETF118 to discuss?

: 4.3. Capacity Availability Index Metadata : : The Capacity Availability Index Sub-TLV: : : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | CapAvailIdx Sub-Type | Reserved | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Site-ID (2 octets) | Site Availability Percentage | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where's the Length? Is Reserved a typo? [Linda] There is no Length field. Only 4 octets. Change the text to the following: “The Capacity Availability Index Sub-TLV has a fixed length of 4 Octets. Therefore, there is no Length field.”

Note that the site-id and site availability percentage breaks your "all sub-tlv values are 32 bit integers". (Noted below commenting on §4.1.2) [Linda] two Octets is enough to represent a percentage: e.g., 100%, 50%, or 0%.

: 4.4.3. Raw Load Measurement Sub-TLV : [...] : Raw Load Measurement Sub-TLV has the following format: : : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Raw-Load-Measurement Sub-Type | Length | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | Measurement Period | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of packets to the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of packets from the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of bytes to the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : | total number of bytes from the Edge Service | : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This also breaks the "all sub-tlv values are 32 bit integers" rule.

[Linda] Each value is represented by 32 bits. How does it break the rule?

Additionally, these values likely want to be 64-bits wide. Given network capacities and speeds these days 32 bits is likely too small. [Linda] That is a good point. Does 64 bits break the 32 bits rule?

: - Measurement Period: BGP Update period in Seconds or user-specified period.

Bad idea. Either specify seconds or provide an additional field so that that measurement quantum can be determined by the applications or management platforms.

[Linda] The 32 bits Measurement Period is intended to indicate the Period. Can you suggest another format?

Error Handling:

: 4.1.3. Error Handling : : A BGP speaker SHOULD NOT include more than one Metadata Path Attribute in : one BGP Update message.

Largely shouldn't be mentioned. It's a session fatal error per RFC 4271, §6.3. [Linda] Is it better to change the text to the following? “A BGP speaker MUST NOT include more than one Metadata Path Attribute in one BGP Update message. Per [RFC4271] Section 6.3, A BGP speaker must not include multiple instances with the same type for the Sub-TLVs specified in this document in one Metadata Path Attribute.”

I thought it is better to emphasize this.

: A BGP UPDATE message that includes the Metadata Path Attribute doesn't : change the BGP Error Handling procedure specified in the [RFC7606].

Similarly, don't mention. You don't get to break these rules. :-) [Linda] okay, per you comment below, the content of 4.1.3 is merged with Section 8:

: If one of the Sub-TLVs has an invalid value, e.g., out of its specified. : ranges, the Sub-TLV with the invalid value is ignored by the BGP receiver.

Ignored is fine. Should the TLV be stripped prior to propagation? Unlike unknown TLVs, a known TLV with an invalid value could be cleaned up. [Linda] Is it OK just forward as it is? as the other nodes might be able to handle the value.

: By default, no notification is required unless configured to send a : notification to its management system.

I don't think you mean to use "notification" here since that word tends to be used for the NOTIFICATION PDU as part of a session reset. Is your intent that logging of the error locally or to a management system is optional rather than a SHOULD?

[Linda] yes. Logging to management system is optional. By default, it is not. Logging the error locally or to a management system is optional.

: 8. Validation and Error Handling

I'd suggest condensing the 4.1.3 and 8. error handling sections together.

[Linda] Okay. Move the content of Section 4.1.3 to Section 8. The Section 8 is now changed to the following:

In addition to the Error Handling procedure described in [RFC7606], a BGP speaker should ignore the Metadata Path Attribute if more than one Metadata Path Attribute is within one BGP Update message. The Metadata Path Attribute contains a sequence of Sub-TLVs. The Metadata Path Attribute's length determines the total number of octets for all the Sub-TLVs under the Metadata Path Attribute. The sum of the lengths from all the Sub-TLVs under the Metadata Path Attribute should equal the length of the Metadata Path Attribute. If this is not the case, the TLV should be considered malformed, and the "Treat-as-withdraw" procedure of [RFC7606] is applied. When more than one sub-TLV is present in a Metadata Path Attribute, they are processed independently. Suppose a Metadata Path attribute can be parsed correctly but contains a Sub-TLV whose type is not recognized by a particular BGP speaker; that BGP speaker MUST NOT consider the attribute malformed. Instead, it MUST interpret the attribute as if that Sub-TLV had not been present. Logging the error locally or to a management system is optional. If the route carrying the Metadata path attribute is propagated with the attribute, the unrecognized Sub-TLV remains in the attribute.

General sub-TLV questions/issues:

: 4.1.2. TLV Format : : All values in the Sub-TLVs are unsigned 32 bits integers.

Are you sure you really want to force these values to be constrained to 32 bits? This may preclude adding other types with more flexible values in the future, even if something boring like uint64 or decimal64.

Minimally, you should say if these are signed or unsigned integers. Based on usage throughout the rest of the document, I suspect they're intended to be unsigned.

Having decided what the required - fixed-sized - length is, the error handling considerations should be addressed. The section 4.1.3 text covers invalid (semantic) values, but doesn't address syntactic issues with this field. If the length is not correct, is only the sub-TLV itself considered malformed or is the entire Path Attribute malformed? If the latter, the usual RFC 7606 treat-as-withdraw behaviors should be done.

Speaking of Length, the document should describe the semantics of the sub-TLV length field and whether it is only the length of the value portion of the field or inclusive of the type and length field itself. This occasionally varies in IETF protocols and can lead to interop issues.

Route Churn Considerations:

Several fields contain values that are intended to be some level of dynamic metric. (Zero surprise given that's the purpose of the feature!) This includes the Capacity Availability Index, Site Delay Prediction Index, Service Delay Prediction Index, Raw Load Measurement, etc.

Section 7 at least addresses that metric change can impact path selection, and attempts to provide a default lower bound for such churn. Good!

However, since this mechanism is intended to be able to be used on routes that are used for BGP nexthop resolution (e.g., labeled unicast), the churn in these metrics can result in not only churn of the prefixes carrying the data, but dependent routes.

This churn is highly analogous to the impacts of features such as RSVP auto-bandwidth which is known to have significant negative network impacts. It's minimally responsible to mention this broader impact.

4.3. Capacity Availability Index Metadata Issues:

Section 4.3.2 discussses that this TLV is used to help decorate routes that have a nexthop where the routes share a site-id. However it's unclear if this TLV is intended to be used BOTH on the routes used as nexthops and the routes that resolve over said nexthops as the correlator? [Linda] How about changing the first paragraph to the following?

The Capacity Availability Index indicates the percentage of impact on a group of routes associated with a common physical characteristic, for example, a pod, a row of server racks, a floor, or an entire DC. The purpose is to use one UPDATE message to indicate a group of routes of different NLRIs impacted by a physical event. For example, a power outage to a pod can cause the Capacity Availability Index to be 0% for all the routes in the pod. Partial fiber cut to a row of shelves can cause the Capacity Availability Index to 50% for all the routes in those shelves. The value is 0-100, with 100% indicating the site is fully functional, 0% indicating the site is entirely out of service, and 50% indicating the site is 50% degraded. It is recommended to assign each route with one Site-ID. Depending on deployment, one DC can use POD number as Site-ID, another DC can use Row of Shelves as the Site-ID

Can routes used as nexthops have more than one site-id bound to them? Can the routes resolving over them? [Linda] It is recommended to assign each route with one Site-ID. Having multiple Site-ID, even though more flexible, is too complicated. Depending on deployment, one DC can use POD number as Site-ID, another DC can use Row of Shelves as the Site-ID. See the updated text for Section 4.3.2 above.

If the same TLV is used on the routes resolving over the nexthops, how is the site availability percentage filled in? [Linda] BGP UPDATE with standalone Site Availability Index is NOT intended for resolving NextHop.

The following text is also unclear: : However, it is unnecessary to include the Site Capacity Availability Index : for every BGP Update message if there is no change to the site-reference : identifier or the Capacity Availability value for the service instances.

BGP uses implicit withdraws when UPDATES advertise a given NLRI with a set of Path Attributes. If the Path Attributes contain metadata that excludes the capacity TLV, I'd presume that it goes away from the route in question. What is the intent of the text above? [Linda] the intent is to say that the Metadata Path Attribute values stays unchanged if not included in the UPDATE. Is it OK?

Route Selection Considerations:

: 5.2. Integrating with BGP decision process

: For the selected services configured to be influenced by the Edge Service : Metadata, the ingress router BGP Decision process [IDR-CUSTOM-DECISION]

While the custom decision process draft is adopted work, it's not widely implemented.

If the proposal is using the cost community, what is the recommended contents of that community for these procedures?

[Linda] Robert asked us to use the BGP Decision process [IDR-CUSTOM-DECISION]. Can we chat how to write this at IETF 118?

Inconsistent Route Selection Considerations:

Devices that recognize TLVs inconsistently may have inconsistent route selection. This should be flagged as an issue. [Linda] having inconsistent route selection should be OK as any of the destinations can serve the request.

Features that impact BGP's route selection need significant additional scrutiny, and IDR review hasn't always been great about catching such things. The primary issue is that when such features are inconsistently deployed, or their inputs are inconsistently made use of (see comment above), BGP Speakers in the network can come to different conclusions about what the active route should be.
[Linda] We are talking about a very small set of selective Edge services routes that are instantiated in the Edge DC directly attached to the Egress routers. Majority of the routes don’t have Metadata Path Attributes attached.

This can result in forwarding loops.

Features that use optional non-transitive path attributes can be more safely deployed in a network, but still require the feature to be consistently deployed within an IGP domain in many cases. This document is currently unclear about the scope of deployment, but with the focus on the ingresses, it has the feel that the intent that partial deployment is a consideration. [Linda] Do you think making Metadata Path Attribute to be Non-Transitive is a MUST?

The text in section 6 of your document seems to confirm that partial deployment is under consideration.

See general considerations in draft-haas-idr-bgp-attribute-escape, §3 for some discussion on these points.

Note that tunneling to the nexthop can mitigate some of the forwarding loop considerations in some cases, but not all.

Scoping Considersations:

RTC procedures:

: 6. Service Metadata Propagation Scope : : For each registered low-latency Service, BGP RT Constrained Distribution : [RFC4684] can be used to form the Group interested in the Service. The : "Service ID", an IP address prefix, is the Route Target.

Is the intention here that a general-purpose IP-formatted route-target should contain the service ID? [Linda] Yes.

This seems to be limiting the feature's use to only VPN service routes using this specific format. If that's not the case and this is intended as a way to mark subsets of routes using this feature for constrained distribution, you probably want to use a new extended community type/subtype. [Linda] the 5G edge service Metadata Path Attribute is not intended for VPN. Rather, using the IP prefixes.

Note that rt-constrain on non-route-target extended communities has been discussed previously on IDR, and some drafts trying to codify the same, but support for the same is currently not deployed to the best of my knowledge.

[Linda] RFC4684 says that IP address prefixes (instead of Route Target) can be used to form the interested Groups.

The procedures covering exactly what happens for advertising the RTC route and attracting the group routes is unclear. See my other questions about site-id in my comments above.

An example here would be quite helpful.

Attribute Escape Considerations:

This draft specifies a new path attribute, that may be optionally transitive, or not. (The need for clarification is in prior comments.)

[Linda] changed the abstract to the following in v-12. And removed all Transitive requirement throughout the document. “This draft describes a new Metadata Path Attribute that can be optionally transitive or not and some Sub-TLVs for egress routers to advertise the Metadata about the attached edge services (ES). The Edge Service Metadata can be used by the ingress routers in the 5G Local Data Network to make path selections not only based on the routing cost but also the running environment of the edge services. The goal is to improve latency and performance for 5G edge services”.

This new attribute may be attached to Internet scoped routes. [Linda] the Metadata Path Attribute is NOT intended for Internet scoped routes. Rather, for the limited routes instantiated in 5G edge DCs which are connected with the 5G ingress routers by 5G Local Data Networks.

Section 9 and Section 10 attempt to limit the deployment of this feature within "trusted domains" "between Ingress and egress routers of one single BGP domain". [Linda] Yes.

Section 4.1.1 attempts to address the scoping consideration further by: : In order to prevent distribution of the BGP Metadata Path Attribute beyond : its intended scope of applicability, attribute filtering SHOULD be deployed : to remove the BGP Metadata Path attribute at the administrative boundary.

As addressed in draft-haas-idr-bgp-attribute-escape, such filtering desires and expectations of limited domains have tended to be wishful thinking and we keep ending up with operational accidents. [Linda] any suggestions for the wording? The Metadata Path Attribute is for the limited routes instantiated in 5G edge DCs which are connected with the 5G ingress routers by 5G Local Data Networks.

I would STRONGLY suggest that the Path Attribute definition be updated to provide additional scoping information wherein a remote BGP domain receiving an escaped metadata Path Attribute can determine that it should NOT be locally used for the procedures discussed in this document. [Linda] The Introduction section already has the following statement: This document describes a new Metadata Path Attribute added to a BGP UPDATE message [RFC4271] for egress routers to advertise the Metadata about a subset of edge services directly attached to the egress routers.

Is it enough?

An example of such a change would be to add an Autonomous System number to the Path Attribute providing the context of the AS that should be using the contained information. [Linda] can we chat about this change in IETF118?

-- Jeff

suehares commented 9 months ago

Awaiting Jeff Haas acknowledgement

lindadunbar commented 4 months ago

Revision 19 & 20 have fixed all the issues raised by Jeff.

suehares commented 4 months ago

Re-opening to seek resolution check from Jeff at IETF-120