italobusi / draft-poidt-teas-actn-poi-assurance

Other
2 stars 2 forks source link

Scenarios from TIM #1

Open FabioPeruzzini opened 1 year ago

FabioPeruzzini commented 1 year ago

2023 03 draft-poidt-teas-poi-assurance .pptx

ggalimba56 commented 1 year ago

I like the use cases in the presentation and I think we can add more UC. A general comment is about the SDN controller definition, in y opinion should be more clear if it is clearly reported what SDN controller take a specific action. e.g. IP SDN Controller, DWDM SDN controller or Hierarchical SDN controller.

italobusi commented 1 year ago

See comment from @prmanna : https://github.com/italobusi/draft-poidt-teas-actn-poi-assurance/pull/2#issuecomment-1462165159

In general, assurance involves link failure, detect and re-route. Fabio’s slides depicts most those uses-cases in optical domain. There are PSM(protection switching module) modules can provide 50ms switchover in case of LOS. In rest of the cases, Optical PCE will come into picture. It will take more time to decide alternate path and configure the alternate path. Again there will be two flavours, (1) with router port + ZR and (2) others (ie. draft-ietf-teas-actn-poi-applicability) let's deep dive into the Fabio's slide in the next call. Also, I would like to join the discussions going forward and contribute to this standardization.

italobusi commented 1 year ago

ACTN POI weekly call - Ad-hoc (May 3, 2023)

Reviewed slides from Fabio: 2023.03.draft-poidt-teas-poi-assurance.pptx

slide 3 (Failure within the optical network)

An alternative option is to rely on optical protection switching which can take less than 50ms to recover from the failure. In this case, we need to slow down the FRR trigger in the router (e.g., setting the hold-off timer for IP FRR).

This is a protection/restoration coordination use case where the link goes down because the optical is unprotected

The other case is protection coordination where IP FRR shall be delayed to avoid protection switching at both IP and optical layers

Need to distinguish two different sub-cases:

The IP router needs to be informed about this type of configuration

Although the mechanism used to protect within the optical domain is vendor-specific, the behavior of the optical protection switching at the MPI should be the same across multiple vendors

Slide 4 (Maintenance within the optical network)

Julien noted that to lock the IP traffic to the protection path the graceful shutdown option can be used

The description of how the IP traffic is locked can be generalized investigating whether there is a common way to inform the P-PNC on the need to lock the IP traffic on the protection path

Oscar: we should just say that the link is under maintenance and let the P-PNC take the proper action

Aihua: RFC8795 has already a mechanism to provide this information

Paolo-Volpato commented 1 year ago

Maintenance/fault scenarios discussed with Fabio Peruzzini (TIM)

2023.06.draft-poi-assurance-fp-ib-pv_v3.pptx

italobusi commented 1 year ago

ACTN POI Assurance bi-weekly call (September 12, 2023)

The contribution prepared by Paolo, Fabio and Italo 2023.06.draft-poi-assurance-fp-ib-pv_v3 has been reviewed

The contribution analyzes in details the scenarios provided by Fabio with some workflows describing the interactions between MDSC, O-PNC, P-PNC, routers and ROADM nodes

Failure within the optical network

Need to clarify that in case of optical dynamic restoration the optical backup path is not pre-configured as in the 1+1 protection case, but dynamically setup after the failure. Therefore, performing protection switching in the IP layer allows recovery the IP traffic in sub-50ms

Need to clarify that the reversion of the optical path is similar to a maintenance operation and the MDSC takes an active role to coordinate the switching in the IP layer and the reversion in the Optical layer to avoid traffic hit

The slides can be reviewed offline and comments exchanged on github or via e-mail before the next call

italobusi commented 1 year ago

ACTN POI Assurance bi-weekly call (October 10, 2023)

The contribution updated by Paolo 2023.09.draft-poi-assurance-fp-ib-pv_v4.pptx has been reviewed

There has been some discussion on the validity of the scenario with optical protection with an hold-off timer for IP FRR. In this case, a failure on the lin between the router and the edge optical switch will take longer (e.g., 150ms) to be recovered.

It ahs been noted that there are some deployments in the network using optical protection with FRR and hold-off time and that the draft is intended to describe a set of tools/options without being prescriptive. It is up to the operator to choose which tool to deploy in his network bases on its policy and resilience requirements.

italobusi commented 10 months ago

ACTN POI Assurance bi-weekly call (December 19, 2023)

Reviewed the latest update to the slides from Fabio/Italo/Paolo:

2023.10.draft-poi-assurance-fp-ib-pv_v4.pptx

Dual-homing and node failure scenario

The suggestion is to setup the green back-up path as disjoint from the yellow working path

This is a generic issue to be considered for all the multi-layer recovery scenarios in order to avoid that a single failure is going to affect both primary and reverse IP FRR paths

A gap has been identified in this issue since there is a need to request the setup of disjoint paths that belong to different tunnels

Steps 11 and 12 are optional since they depend on the IP FRR configuration: if IP FRR is revertive, R1 can switch the traffic to the dashed green path automatically as soon as the dashed green path becomes available

In the description we can mention that the dashed green path can go to R3, in case of dual-homing, or to another router (e.g., R4), in case of multi-homing.

Alert in case of degrade of DWDM link performance

The scenario for degraded WDM link requires further discussion with Fabio. As written in the original slides, the scenario is not a recovery scenario nor a multi-layer coordination scenario since it requires the ROADM to report whether the pre-FEC BER is above a given configured threshold.

One possibility is that this alert is used to trigger IP FRR protection for preventive traffic recovery.

However, with the new FEC algorithms, it seems hard to predict when the FEC will stop correcting errors in time to perform preventive traffic recovery. This is an issue to discuss with FEC experts to see if there other parameters to monitor for preventive traffic recovery.