ietf-wg-idr / draft-ietf-idr-5g-edge-service-metadata-14

Editing for the 5G Service Metadata
0 stars 1 forks source link

Route churn #15

Closed suehares closed 2 months ago

suehares commented 6 months ago

Route Churn Considerations: Several fields contain values that are intended to be some level of dynamic metric. (Zero surprise given that's the purpose of the feature!) This includes the Capacity Availability Index, Site Delay Prediction Index, Service Delay Prediction Index, Raw Load Measurement, etc.

Section 7 at least addresses that metric change can impact path selection, and attempts to provide a default lower bound for such churn. Good!

However, since this mechanism is intended to be able to be used on routes that are used for BGP nexthop resolution (e.g., labeled unicast), the churn in these metrics can result in not only churn of the prefixes carrying the data, but dependent routes.

This churn is highly analogous to the impacts of features such as RSVP auto-bandwidth which is known to have significant negative network impacts. It's minimally responsible to mention this broader impact.

lindadunbar commented 2 months ago

How about adding the following to Section 7?

While the Minimum Interval for Metrics Change Advertisement is configurable, operators should be aware that frequent updates to the metrics carried in the Metadata Path Attribute can lead to route instability and churn. This is particularly important when the routes carrying this attribute are used for BGP next-hop resolution, as changes in the metrics could trigger updates in dependent routes. The potential for churn in these metrics is similar to the effects seen with features like RSVP auto-bandwidth, which are known to have negative impacts on network stability. Operators should carefully consider the trade-offs between the benefits of dynamic metric updates and the potential for increased churn when configuring the Minimum Interval for Metrics Change Advertisement.

lindadunbar commented 2 months ago

actual text added to v21:

Route Churn Considerations

While the mechanism detailed in this document aims to provide dynamic metrics like Capacity Availability Index, Site Delay Prediction Index, Service Delay Prediction Index, and Raw Measurement to optimize path selection, it is essential to consider the broader implications of metric-induced churn. Particularly, in the context of routes used for BGP nexthop resolution (e.g., labeled unicast), frequent changes in these metrics can lead to significant churn not only for the prefixes carrying the data but also for dependent routes.

This behavior is analogous to the impacts observed with RSVP auto-bandwidth, which can introduce considerable instability within a network. Such route churn can propagate through the network, causing a cascade of updates and potential route flaps, thereby affecting overall network stability and performance.

To mitigate these effects, network operators should carefully manage the advertisement intervals of these dynamic metrics, ensuring they are set to avoid unnecessary churn. The default minimum interval for metrics change advertisement, set at 30 seconds, is designed to balance responsiveness with stability. However, in scenarios with higher sensitivity to route stability, operators may consider increasing this interval further to reduce the frequency of updates.

Furthermore, operators should implement robust route damping and filtering policies to control the propagation of changes and minimize the impact on dependent routes. By acknowledging and planning for these broader impacts, the mechanism can be deployed more effectively, ensuring optimal performance without compromising network stability.

suehares commented 2 months ago

This comment resolves my earlier comment.