KazuyaAnazawa commented 1 year ago

Summary of this issue.

This issue discusses how to support coherent QSFP-DD (OpenZR+) module control on Goldstone. Specifically, the way to configure network-interface-mode and host-interface-mode of OpenZR+ module is discussed.

NOTE: Here, network-interface-mode refers to a pair of modulation-format/line-rate/FEC for line side while host-interface-mode refers to port breakout and SerDes configurations of switch ASIC side including modulation, late, lane count, speed, etc). The host-interface-mode is defined at SFF-8024.

The target platforms are AGR400 (Qumran2C) / CSR440 (Qumran2A) from Edgecore, and S9510-28DC (Qumran2A) from Ufispace. As the first step, we would like to focus on AGR400 (Q2C) platform at this point.

Components for OpenZR+ module control

We need to introduce HAL and south-layer which control OpenZR+ modules.
- The xcvrd from SONiC is introduced for HAL while south-xcvr is introduced for south-layer (see figure).
- Since south-xcvr handles OpenZR+ module controls, it subscribes goldstone-transponder model.
- NTT / NEC have already supported this feature. The source codes will be upstreamed on Sep./Oct. 2023.
When configuring OpenZR+ modules, we consider not only HAL and south-layer for OpenZR+ modules but also for switch ASIC. So we need to introduce HAL and south-layer which control Q2C/Q2A switch ASIC.
- containerized OcNOS is introduced for HAL while south-ocnos is introduced for south-layer (see figure).
- Since south-xcvr handles OpenZR+ module controls, it subscribes goldstone-interface model.
- NTT / IPI have already supported this feature. The containerized OcNOS should be purchased from IPI while south-ocnos will be enabled for these platforms.

fig1

Issues

Main issue for this work

Unlike CFP2 case, network-interface-mode and host-interface-mode should be configured at the same time.
- In CMIS/C-CMIS case, datapath initialization of QSFP-DD module side should start after valid Tx signals from the switch ASIC side is sent to module's optics side.
- Therefore, we should configure switch ASIC side (port breakout/SerDes) according to the specified host-interface-mode by OcNOS. Then, OpenZR+ module should be configured according to the specified host-interface-mode and network-interface-mode by xcvrd.
This indicates south-xcvr and south-ocnos should cooperatively work for OpenZR+ module configuration.
To achieve this, we need to consider the followings.
1. Goldstone native models for configuring network-interface-mode and host-interface-mode., and for getting supported mode list of the modules.
2. New design/architecture that south-xcvr and south-ocnos can cooperatively work.

Another issue

QSFP-DD modules are classified into two types, that is, coherent or non-coherent. These modules can be inserted into same ports.
Though this issue is out-of-scope in this work, but we need to consider this as needed.

Requirements & Implementation design (suggestion).

First, several requirements for implementation are described. Then, implementation design that satisfies the requirements is proposed.

Requirements

The running datastore should be updated by the north daemons only.
- south-xcvr should not update goldstone-interface for south-ocnos.
- south-ocnos should not update goldstone-transponder for south-xcvr.
Each Goldstone primitive model should be subscribed and handled by only one south daemon.
- south-xcvr should not subscribe both goldstone-transponder and goldstone-interface or goldstone-ocnos.
- ditto for south-ocnos.
There may be several port breakout or SerDes configurations for different ASIC or gearbox. We should not depend on specification of a specific platform (AGR400 (Q2C) in this case).
State transition of QSFP-DD modules should be cared. According to CMIS/C-CMIS specification, the module must receive a valid input signal while performing initialization activities in the subsequent DPInit state. Therefore, OcNOS and xcvrd should be configured in an appropriate order.

Implementation direction (suggestion)

I would like to propose an implementation design. The whole picture is shown below. fig2

To specify network-interface-mode and host-interface-mode from north layer, we need a new model for it (refer to the model as goldstone-mode).
To satisfy the above requirements, intermediate-layer is introduced. This component performs the followings.
- It recognizes the specification of AGR400/Q2C, i.e., supported port breakout / SerDes configuration variations, via platform.json. The supported mode list should not be hardcoded.
- It subscribes goldstone-mode. According to the changes on the model, it configures south-xcvr and south-ocnos using handlers implemented inside them. The configuration contents for south-xcvr and south-ocnos will be generated by referring platform.json..
- It performs validation when network-interface-mode and host-interface-mode are specified. Validation includes the followings.
- Check if specified network-interface-mode and host-interface-mode are supported by the platform.
- Check if configurations related to network-interface-mode and host-interface-mode are already configured from north-layer. If such configurations already exist, it should be notified to operators (goldstone users) and aborted.
- It cares the state transition of QSFP-DD modules. Datapath initialization process in QSFP-DD module should start after valid Tx signals from the switch ASIC side is sent to module's optics side. Therefore, it first configures port breakout / SerDes for switch ASIC side via OcNOS, and the configures QSFP-DD module side via xcvrd.

Tasks

Discuss proposed architecture and fix implementation direction.
Define models for configuring network-interface-mode and host-interface-mode, and getting supported mode list.
- It may be better to define the models based on OpenConfig.
Decide the format of platform.json.

Additional information

Supported mode list for Q2C. Details are also available here.	Port speed	Lane count	Mode
400GbE	8	400GAUI-8 C2C 400GAUI-8 C2M 400GBASE-KR8 400GBASE-CR8	53.125
200GbE	4	200GAUI-4 C2C 200GAUI-4 C2M 200GBASE-KR4 200GBASE-CR4	53.125
100GbE	2	100GAUI-2 C2C 100GAUI-2 C2M 100GBASE-KR2 100GBASE-CR2	53.125
100GbE	2	CAUI-2 C2C CAUI-2 C2M	51.5625
100GbE	4	CAUI-4 C2C CAUI-4 C2M 100GBASE-KR4 100GBASE-CR8	25.78125
100GbE	4	100CAUI-4 C2C 100CAUI-4 C2M	26.5625

@ishidawataru @noguchiko @ipi-claytonpascoal @matsuo-tat @iAgrawalGaurav @santanukar2000 Could you please check the proposal and can we discuss this here? (FYI @hitoshiirino-hcontt @konomimochizuki @s-homma @youhei-katayama)

ishidawataru commented 1 year ago

This indicates south-xcvr and south-ocnos should cooperatively work for OpenZR+ module configuration.

Do you know how SONiC handles this issue? Who is taking care of the configuration order in SONiC?

In CMIS/C-CMIS case, datapath initialization of QSFP-DD module side should start after valid Tx signals from the switch ASIC side is sent to module's optics side.

How about rebooting QSFP-DD periodically until it gets the valid Tx signal from the ASIC instead of waiting for it?

KazuyaAnazawa commented 1 year ago

@ishidawataru Thank you so much for your comments. Really appreciate.

Do you know how SONiC handles this issue? Who is taking care of the configuration order in SONiC?

It seems that this issue is recognized by SONiC community, and configurations of xcvrd and syncd should be synchronized. But, it also seems that there is no way for xcvrd to know the status of ASIC side. So this issue may not be handled in SONiC. https://github.com/sonic-net/SONiC/blob/master/doc/sfp-cmis/cmis-init.md#outside-the-scope

How about rebooting QSFP-DD periodically until it gets the valid Tx signal from the ASIC instead of waiting for it?

I see. I think it's technically possible and reasonable. In my assumption, one of important responsibility of intermediate-layer is to take care of module state. So, in that case, intermediate-layer may not need to be introduced..

@noguchiko @matsuo-tat Do you have any additional comments around this?

ishidawataru commented 1 year ago

It might be reasonable to introduce the rebooting logic in xcvrd so that we can even fix the issue that exists in SONiC.

matsuo-tat commented 1 year ago

@KazuyaAnazawa @ishidawataru Thank you for suggestion. I agree with the policy of regularly rebooting QSFP-DD with xcvrd.

By the way, xcvrd from SONiC has already implemented a function to retry DataPath transition if it cannot transition up from DPInit to DPActivated. Currently, it retries up to 3 times, and I think that setting it to unlimited retries would be equivalent to restarting QSFP-DD until a valid Tx signal is received. If unlimited retries result in high load, adjustments such as setting intervals may be necessary.

noguchiko commented 1 year ago

@KazuyaAnazawa

The running datastore should be updated by the north daemons only.

Translation daemons also update the running datastore. In my understandings, the intermediate layler in the proposal is a translation daemon. It translates new model into/from goldstone-transponder and goldstone-interfaces.

To discuss the architecture, I think that we need to define what (maybe new) attributes of goldstone-transponder and goldstone-interfaces should be used by the intermediate layer. It may clarify the responsibilities of the south components.

KazuyaAnazawa commented 1 year ago

@matsuo-tat @noguchiko

Thank you so much for your comments.

Currently, it retries up to 3 times, and I think that setting it to unlimited retries would be equivalent to restarting QSFP-DD until a valid Tx signal is received.

Yes, I agree with this direction. We can add patches to xcvrd as we did before, and also contribute to SONiC.

To discuss the architecture, I think that we need to define what (maybe new) attributes of goldstone-transponder and goldstone-interfaces should be used by the intermediate layer. It may clarify the responsibilities of the south components.

Definitely. Thank you for the comments. I will consider which attributes of goldstone-transponder and goldstone-interface should be handled by intermediate (or xlate) layer. Then, will clarify and write down the responsibilities of each south daemon. Also, we may be able to make use of OpenConfig model (for new model) and xlate-oc (for intermediate layer). I will think about it.

KazuyaAnazawa commented 1 year ago

Tried to clarify what attributes of goldstone-transponder and goldstone-interfaces are handled by intermediate layer. The whole picture is shown below.

First, we need to extend goldstone-transponder to include the following attributes for QSFP-DD configurations.

network-interface-mode
host-interface-mode
num-host-interfaces (this may be unnecessary if it can be derived by network-interface-mode and host-interface-mode.)

Then, I thought we may be able to make use of the following OpenConfig models for new model and xlate-oc for intermediate layer in this work (thanks to noguchi-san's comments).

openconfig-terminal-device
- network-interface-mode on goldstone model corresponds to operational-mode on openconfig-terminal-device.
openconfig-platform
- host-interface-mode on goldstone model corresponds to group-config on openconfig-platform.
- index of each group-config corresponds to host-interface-mode id. ※ The information of used lanes is represented as a bitmask in CMIS 5.1 and xcvrd. How we express this configuration on the models is not considered at this point.

Given these configurations from north-layer, each component performs as follows:

xlate-oc
- Update running configurations of attributes related to SerDes on goldstone-interfaces (the configuration items are shown in figure).
- Update running configurations of network-interface-mode, host-interface-mode, and num-host-interfaces on goldstone-transponder.
south-ocnos
- Performs SerDes configurations (see attributes in the figure below) at switch fabric side via containerized OcNOS .
south-xcvr
- Performs QSFP-DD (see attributes in the figure below) configurations via xcvrd.

arch2

@noguchiko Do you have any comments on the above from both implementation and architectural design point of view? Especially, current xlate-oc supports only a platform with gearbox (not switch ASIC).

noguchiko commented 1 year ago

@KazuyaAnazawa Thank you. I'll comment after confirming the details..

noguchiko commented 1 year ago

I think the structure is fine.

However, it is better not to assume OpenConfig models in the architecture. It possibly decreases functional cohesion and make it difficult to support the feature in other standardized models. Also, current xlate-oc has issues with switch ASIC devices as you mentioned.

Therefore, I think it is better to focus on providing the feature with goldstone primitive models first. It allows users/controllers to use coherent QSFP-DDs in various modes on devices with Q2C/Q2A switch ASIC. After that, we can wrap the feature in standardized models (e.g. OpenConfig). It may improve usability of the feature.

KazuyaAnazawa commented 1 year ago

@noguchiko Thank you so much for your valuable comments. You are right.

I will investigate CMIS/C-CMIS and xcvrd API in detail, and try to clarify features/attributes that should be provided by goldstone primitive models (i.e., goldstone-transponder.yang and goldstone-interfaces.yang) ASAP.

The main focus of NEC side will be QSFP-DD module configuration by goldstone mgmt daemons, so could you please do this in NEC side in parallel? After the models are fixed, we can move on to the next step of how to serve the models by xlate layer (maybe xlate-oc or other components).

@ipi-claytonpascoal @iAgrawalGaurav We also need to clarify the features/attributes provided by goldstone-interfaces. Currently, I think the attributes described in https://github.com/oopt-goldstone/goldstone-mgmt/issues/80#issuecomment-1655507436 should be configured by OcNOS/south-ocnos. If you have any comments on this proposal from the OcNOS point of view, please give some feedbacks. Or, proposals from IPI side are also welcome.

KazuyaAnazawa commented 1 year ago

@noguchiko @matsuo-tat I investigated the attributes which may be required for goldstone-transponder first. My conclusion is that adding "network-interface-mode" and "host-interface-mode" at transponder-module-config in goldstone-transponder is enough.

The reason is as follows.

At register level, the transmission mode of QSFP-DD (a pair of host-interface-mode and network-interface-mode) is configured by setting vendor-specific AppSelCode to the module.
- AppSelCode can be uniquely specified by host-interface-mode and network-interface-mode.
The client of Xcvrd should specify speed (host-interface-side), lanes, and media_interface_code (i.e., network-interface-mode) to configure transmission mode of QSFP-DD. Then, Xcvrd computes desired AppSelCode (here), and sets it to the module (here).
- The speed (host-interface-side) and lanes can be identified by "host-interface-mode", "host-lane-assignment-options" of the Application, and platform information that describes lane number the module will use.
- More specifically, the lanes can be identified by referring host_lane_count and host_lane_assignment_options fields in Application Descriptor.
- The media_interface_code can be identified by "network-interface-mode" since they are equivalent.
- Though current Xcvrd has no attribute to configure network-interface-mode, we have patched it so that network-interface-mode can be configured for OpenZR+.

The sequence by picture is as follows.

issue_gs_xpdr

For these reasons, I think we just need to add "network-interface-mode" and "host-interface-mode" at transponder-module-config in goldstone-transponder. What do you think?

noguchiko commented 1 year ago

@KazuyaAnazawa Thank you for the update. It looks good to me.

I have one question for clarification.

More specifically, the lanes can be identified by referring host_lane_count and host_lane_assignment_options fields in Application Descriptor.

In order to identify the lanes, south-xcvr may also need the physical lane configuration of the network device. My understanding is that it is provided by the platform.json in the picture. Is it right?

KazuyaAnazawa commented 1 year ago

@noguchiko Thank you for checking.

In order to identify the lanes, south-xcvr may also need the physical lane configuration of the network device. My understanding is that it is provided by the platform.json in the picture. Is it right?

Yes, your understanding is correct. The lane numbers for each module are provided in that platform specific file platform.json.

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa Based on the current OcNOS release, south-ocnos/OcNOS will need the following attributes from the goldstone-interface module:

/gs-if:interfaces/interface/ethernet/breakout/config/num-channels
/gs-if:interfaces/interface/ethernet/breakout/config/channel-speed
/gs-if:interfaces/interface/config/pin-mode
/gs-if:interfaces/interface/ethernet/config/fec - for the broken-out ports, if FEC is required

However, on future releases, OcNOS will no longer depend on interface port-breakout and ASIC configuration will be applied based only on AppSelCode.

So, I would suggest south-ocnos also subscribe to goldstone-transponder and get host-interface-mode and network-interface-mode information. In this way, south-ocnos can handle the differences between OcNOS releases without any impact on the other components in the system.

As per requirement 2, "Each Goldstone primitive model should be subscribed and handled by only one south daemon", I think that the intermediate-layer plugin may be necessary to subscribe to goldstone-transponder and replicate the data to both south-ocnos and south-xcvr.

KazuyaAnazawa commented 1 year ago

@ipi-claytonpascoal Thank you for your comments and useful information.

Let me double check for clarification.

When the peripheral control function inside OcNOS is disabled, it correctly configures only switch ASIC side according to the specified AppSelCode. Is this right?
Does the current OcNOS release already support to configure switch ASIC side based on the specified AppSelCode for transceivers?

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa

When the peripheral control function inside OcNOS is disabled, it correctly configures only switch ASIC side according to the specified AppSelCode. Is this right?

That is the plan for future OcNOS releases, a few adaptations may be required on OcNOS.

Does the current OcNOS release already support to configure switch ASIC side based on the specified AppSelCode for transceivers?

No, it does not. The current OcNOS release (6.4.0) still relies on the port breakout to configure the switch ASIC side.

KazuyaAnazawa commented 1 year ago

@noguchiko @matsuo-tat

I think we need to support both current OcNOS release (port breakout / serdes rate configurations are supported) and its future releases (AppSelCode configuration is supported). Also, we may need to think about the following requirement.

"Each Goldstone primitive model should be subscribed and handled by only one south daemon"

One of the solutions could be the introduction of intermediate-layer plugin so that above requirements are satisfied (Suggestion from @ipi-claytonpascoal).

Do you have any comments or proposal for this?

noguchiko commented 1 year ago

@KazuyaAnazawa @ipi-claytonpascoal

"Each Goldstone primitive model should be subscribed and handled by only one south daemon"

That is preferable, but the Goldstone management framework does not explicitly require that.

There is an issue if multiple software components subscribe to the operational state request for the same model. In that case, the operational state values responded by each of the subscribers may conflict, and the datastore may not be able to resolve it. On the other hand, subscribing to the same configuration change by multiple components should not be an issue. Each component configures a different target, and if one fails, the failure can be propagated (by abort event) to the other component for rollback.

How about the following design?

south-xcvrd
- goldstone-transponder
- subscribes configuration change
- subscribes operational state request
  - provides the cmis-application-descriptors (and other states)
south-ocnos
- goldstone-interfaces
- subscribes configuration change
  - for OcNOS release (6.4.0)
- subscribes operational state request
- goldstone-transponder
- subscribes configuration change
  - for OcNOS release (future)
  - uses cmis-application-descriptors from goldstone-transponder to resolve the AppSelCode
- does NOT subscribe operational state request

KazuyaAnazawa commented 1 year ago

@noguchiko Thank you for your comments.

Each component configures a different target, and if one fails, the failure can be propagated (by abort event) to the other component for rollback.

I see, I overlooked it. Thank you for pointing it out.

The design looks good to me, but let me look into it in detail.

@ipi-claytonpascoal Do you have any comments or doubt?

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa The design proposed by @noguchiko looks fine.

Just a suggestion, I think that it would be better if the changes in OcNOS behavior along the releases did not change how the north-bound configures the feature, in this case, the port-breakout configuration would be obsoleted in future OcNOS releases. Maybe we can consider the south-ocnos subscription on goldstone-transponder configuration changes even for release 6.4.0. In this way, south-ocnos would receive the changes on cmis-application-descriptors and decide if it OcNOS needs to be configured via port-breakout or AppSelCode. What do you think about this approach?

Regarding the operational data, south-ocnos does not need to subscribe to the goldstone-transponder, subscription to goldstone-interfaces for the operational state of broken-out ports should be enough.

KazuyaAnazawa commented 1 year ago

@ipi-claytonpascoal Thank you so much for your suggestion. I understood that south-ocnos subscribes to goldstone-transponder regardless of OcNOS version, and it configures port-breakout or AppSelCode according to OcNOS version in your proposal. I think it's reasonable.

By the way, in that case, we may need to consider how to handle non-coherent QSFP-DDs because both coherent/non-coherent QSFP-DDs can be inserted on the same ports (also, goldstone-transponder has been used to model and represent coherent devices).

@noguchiko @matsuo-tat What do you think about Clayton's proposal and handling non-coherent QSFP-DDs?

noguchiko commented 1 year ago

@ipi-claytonpascoal @KazuyaAnazawa

It looks good to me too. It simplifies the operation and will be valuable to users.

handling non-coherent QSFP-DDs

At least, it seems that it is not appropriate to manage non-coherent "transceivers" with a model called "transponder". There are multiple possible designs to solve this. For example, rename goldstone-transponder in order to expand the responsibility of the model, or use different models for coherent and non-coherent QSFP-DD, and so on.

I don't have any solid ideas right now, but I'll try to consider about it.

ishidawataru commented 1 year ago

However, on future releases, OcNOS will no longer depend on interface port-breakout and ASIC configuration will be applied based only on AppSelCode.

@KazuyaAnazawa @ipi-claytonpascoal Could you explain why this decision was made? The Goldstone primitive YANG models basically map to a specific hardware component on the device. (Hence Each Goldstone primitive model should be subscribed and handled by only one south daemon) Originally goldstone-transponder.yang was introduced to cover CFP2DCO, CFP2ACO, or on-board type coherent optics. I think it's better for OcNOS to keep using goldstone-interfaces.yang to control ASIC's SERDES configuration.

You may argue we'll have redundant configurations in the primitive YANG models by following the above design. (AppSelCode in goldstone-transponder.yang and port-breakout setting in goldstone-interfaces.yang ) However, I think that is totally acceptable in the primitive YANG model layer. The purpose of the primitive YANG model is to abstract each hardware component and not to provide a user-friendly and concise configuration model. We could have an additional YANG model (you may call this intermediate model) on top of the primitive models to provide user-friendliness if required.

ishidawataru commented 1 year ago

At least, it seems that it is not appropriate to manage non-coherent "transceivers" with a model called "transponder".

@noguchiko How about introducing goldstone-transceiver.yang to handle both coherent and non-coherent pluggable SFF transceivers?

I'm not sure if we need to keep using goldstone-transponder.yang. It looks like we need to introduce AppSelCode configuration knob to control coherent QSFP-DD transceivers. In that case, the configuration section of goldstone-transceiver.yang, which is based on TAI attributes is mostly useless.

Do we really need to keep using goldstone-transponder.yang for the QSFP-DD-based platforms?

KazuyaAnazawa commented 1 year ago

@ishidawataru Thank you so much for your comments and advice.

As Clayton told here, there is a difference between OcNOS (v6.4.0) and OcNOS (ver > 6.4.0) in controlling ASIC's SerDes. OcNOS (v6.4.0) controls ASIC's SerDes based on port-breakout and SerDes-rate attributes defined on ipi-port-breakout.yang. However, OcNOS (ver > 6.4.0) scrap this way and control ASIC's SerDes based on AppSelCode on ipi-platform. For better understanding, I attach the image below. ※ In my understanding, this is IPI's policy and it's difficult for OcNOS (ver > 6.4.0) to have same control interface with OcNOS (v6.4.0) (@ipi-claytonpascoal Please comment if you have).

So, I tried to design south-ocnos so that both of OcNOS (ver >= 6.4.0) can be used on Goldstone.

For OcNOS v.6.4.0 case, we can subscribe to goldstone-interfaces.yang and configure ASIC's SerDes according to the changes on that model. For OcNOS (ver > 6.4.0) case, south-ocnos should identify appropriate AppSelCode according to changes on goldstone-interfaces.yang. But I thought that it is difficult because south-ocnos has no way to know QSFP-DD's network-interface side configuration. Appropriate AppSelCode can be identified if we can know both network-interface side configuration (i.e., network-interface-mode) and host-interface side configuration (i.e., host-interface-mode).

That's why we made a decision of subscribing goldstone-transponder.yang by south-ocnos. I would really appreciate it if you could give me some advice or comments.

ishidawataru commented 1 year ago

@KazuyaAnazawa Do you plan to use OcNOS to control the pluggable transceivers as well? Or are you going to use xcvrd for that purpose? Is mapping Goldstone's port-break configuration to OcNOS-supported AppSelCode really difficult?

But I thought that it is difficult because south-ocnos has no way to know QSFP-DD's network-interface side configuration.

If OcNOS doesn't control the pluggable transceivers, I think we can use a random network-interface side configuration.

KazuyaAnazawa commented 1 year ago

@ishidawataru Thank you for your comments.

Do you plan to use OcNOS to control the pluggable transceivers as well? Or are you going to use xcvrd for that purpose?

We would like to use xcvrd for controlling the pluggable transceivers, and want OcNOS to focus on ASIC side configuration.

Is mapping Goldstone's port-break configuration to OcNOS-supported AppSelCode really difficult?

I think it is difficult because AppSelCode depends on (decided by) transceiver vendors, not OcNOS. (That is, different transceiver vendors could have a same application, but its AppSelCode could be different among vendors.) One possible solution is to have cmis-application-descriptors of each vendor inside south-ocnos, but we might be better to avoid this.

If OcNOS doesn't control the pluggable transceivers, I think we can use a random network-interface side configuration.

Thank you for your suggestion. Yes, this can be one possible solution. @ipi-claytonpascoal Just in case, please check if OcNOS exactly configures switch AISC side only if its peripheral control function is disabled.

ishidawataru commented 1 year ago

@KazuyaAnazawa

(That is, different transceiver vendors could have a same application, but its AppSelCode could be different among vendors.)

In that case, how does OcNOS check if the given AppSelCode is valid when its peripheral control function is disabled? I think we want OcNOS not to access the SFF transceivers at all. Is that feasible with the ~~current~~ containerized OcNOS (ver > 6.5.0)?

KazuyaAnazawa commented 1 year ago

@ishidawataru In our assumption, south-ocnos gets cmis-application-descriptors (operational state) via sysrepo. For the primitive model, we plan to use extended goldstone-transponder.yang or newly defined goldstone-transceiver.yang, which have cmis-application-descriptors field.

However, if OcNOS internally checks if given AppSelCode is supported on the SFF transceiver, the access to SFF transceivers is mandatory. So some changes will be required in OcNOS as Clayton told here.

noguchiko commented 1 year ago

The Goldstone primitive YANG models basically map to a specific hardware component on the device. ... The purpose of the primitive YANG model is to abstract each hardware component and not to provide a user-friendly and concise configuration model.

@ishidawataru That's an important point. I understood the concept as follows, is this correct?

Hardware component
- provides core functionalities of a network device
- e.g. SFF transceivers, Switch ASICs
HAL component
- provides APIs of a hardware component (or multiple hardware components that has the same interface)
- e.g. xcvrd for SFF transceivers
- e.g. containerized ocnos or sonic for Switch ASICs
South component
- provides APIs based on primitive models for a HAL component
- subscribes and handles primitive models to realize that
- e.g. south-xcvr for xcvrd
- e.g. south-ocnos for ocnos, south-sonic for sonic
Primitive model
- provides an abstraction for a hardware component (or its functionality)
- does not depend on a HAL component
- e.g. goldstone-transceiver for SFF transceivers
- e.g. goldstone-interfaces, goldstone-vlan, and goldstone-static-route for Switch ASICs

If so,

How about introducing goldstone-transceiver.yang to handle both coherent and non-coherent pluggable SFF transceivers?

I think that's appropriate. We should introduce a new model goldstone-transceiver.yang to provide control and management of both coherent and non-coherent pluggable SFF transceivers, instead of extending responsibility of goldstone-transponder.yang.

ishidawataru commented 1 year ago

@noguchiko Thank you for the summary. It matches the idea behind the architecture. (I think we should put the summary in README.md) By introducing goldstone-transceiver.yang, the translation daemons might need to switch handling goldstone-transceiver.yang and goldstone-transponder.yang, which introduces complexity. However, I think that is an unavoidable cost. AFAIK, OpenConfig and OpenROADM haven't considered coherent SFF transceivers yet. They might need to update their models to support these kinds of devices.

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa , your comment regarding the design decision between OcNOS releases is excellent. I do not have anything to add.

Just in case, please check if OcNOS exactly configures switch AISC side only if its peripheral control function is disabled.

This is a work in progress but I will guarantee that this requirement will be met.

In our assumption, south-ocnos gets cmis-application-descriptors (operational state) via sysrepo.

I agree with this approach and I do not think that OcNOS does this kind of validation. Even if it does validation could be bypassed in Goldstone since south-ocnos will guarantee that AppSelCode is valid. I will double-check that and confirm that OcNOS does not need to access SFF transceiver in any situation.

OpenConfig and OpenROADM haven't considered coherent SFF transceivers yet.

OpenConfig has the leaf /components/component/transceiver/config/module-functional-type that can be used to differentiate digital coherent optic and standard grey optic. If I am not mistaken grey optics are handled by the openconfig-platform-transceiver model and coherent modules are handled by openconfig-terminal-device. I hope that can help with goldstone-transceiver and goldstone-transponder definitions

KazuyaAnazawa commented 1 year ago

@ipi-claytonpascoal Thank you for your comments. I think we can proceed with the following design in south-ocnos side for configuring coherent QSFP-DDs.

south-ocnos always subscribes to goldstone-interfaces.yang.
- For OcNOS v6.4.0, it configures port-breakout and SerDes-rate on ipi-port-breakout.yang.
- SerDes-rate is identified by port-breakout and pin-mode configurations on goldstone-interfaces.yang.
- For OcNOS v >= 6.5.0, it configures AppSelCode on ipi-platform.yang.
- south-ocnos gets cmis-application-descriptors (operational state on goldstone-transceivers.yang) via sysrepo.
- Pick up an AppSelCode whose host-interface-mode (host-electrical-interface-id) matches with port-breakout and pin-mode configurations on goldstone-interfaces.yang.

Also, south-ocnos should carefully handle interfaces when setting port-breakout for them. Specifically, if we configure port-breakout for an interface (e.g., cd0) to 4x100G, the interface cd0 is removed from containerized OcNOS and we will see new interfaces cd0/1, cd0/2, cd0/3, and cd0/4. Even if the interface cd0 is removed, its configuration will still be in sysrepo. So, south-ocnos should always manage the original interface, cd0, and return configuration/operational-state of the original interface.

@noguchiko @ishidawataru This is F.Y.I, and if you have any comments, please let me know.

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa I agree with the proposed design.

Regarding the interfaces involved in the port breakout, the parent interface (cd0) can be handled in south-ocnos even if it is not present in containerized OcNOS, but once it is broken out the only possible configuration would be removing the breakout configuration. Is that the expectation? The broken-out ports (cd0/1, cd0/2,...) must be included in Sysrepo to be independently configured for traffic. Which Goldstone component will be responsible for that?

KazuyaAnazawa commented 1 year ago

@ipi-claytonpascoal Thank you for your reply. I would like to describe my assumption and expectation in more detail.

south-ocnos handles port-breakout configuration for an interface (e.g., cd0).
- port-breakout can be configured only when no configuration on the interface.
south-ocnos returns operational-state including parent interface (cd0).
- it handles deleting port-breakout configuration for parent interface (cd0).
- it handles L2/L3 configurations only for broken-out ones (cd0/1, cd0/2, ...).
south-ocnos deletes port-breakout configuration for parent interface
- this can be done only when no configurations on broken-out ones (cd0/1, cd0/2, ...)

For better understanding, attach the figure describing how it works.

The broken-out ports (cd0/1, cd0/2,...) must be included in Sysrepo to be independently configured for traffic. Which Goldstone component will be responsible for that?

As described above, south-ocnos will be responsible for configuring broken-out ports. Does this clarify your doubts?

noguchiko commented 1 year ago

@KazuyaAnazawa I have one comment.

south-ocnos returns running-config / operational-state including parent interface (cd0).

I think that south daemons should not return running-config (data in config containers), because it may conflict with the user configuration stored in the running datastore. Even if we do so, users can recognize the existence and actual configuration of interfaces (cd0, cd0/1 ...) by using operational-state.

KazuyaAnazawa commented 1 year ago

@noguchiko Thank you for pointing it out. So sorry, it was just my mistake. You are right. We will implement south-ocnos so that it can return operational-state.

ipi-claytonpascoal commented 1 year ago

@KazuyaAnazawa Thank you very much for the detailed explanation. It is all clear now!

noguchiko commented 1 year ago

@KazuyaAnazawa @ishidawataru I'm designing the goldstone-transceiver model now. Here are my thoughts, I would appreciate your comments.

The goldstone-transceiver model abstracts SFF transceivers. It would be nice to be modeled based on CMIS/C-CMIS, which inherits the SFF manegement interface conventions and can support devices with more than 8 lanes in the future.

The model should support the following management features:

Advertisement
- Basic information
- Applications
Configuration
- Data Path
- Alarm mask, threshold
- PM mask, threshold
Monitoring
- Status (for Module, Data Path...)
- Alarm (Flags)
- Performance (VDM)

I think that we have the following two options for how to model it:

Convert the CMIS register structure described in "8 Module Management Memory Map" into a tree structure.
Implement the CMIS functional models described in "6 Core Management Features" in YANG.

The following (collapsed) sections are examples for each option. I would appreciate any comments on which option or another we should take.

Based on CMIS registers

## Based on CMIS registers ### Tree ```txt module: goldstone-transceiver +--rw modules +--rw module* [index] +--rw index -> ../config/index +--rw config | +--rw index | : +--ro state | : +--rw host-lanes | +--rw host-lane* [index] | +--rw index -> ../config/index | +--rw config | | +--rw index | | +--rw app-sel-code | | +--rw data-path-id # DataPathID [0-7] - typically first host lane number minus one | | +--rw network-path-id # NPID [0-7] - typically first host lane number minus one | | : | +--ro state | : | +--ro data-path-state | : +--rw media-lanes | +--rw media-lane* [index] | +--rw index -> ../config/index | +--rw config | | +--rw index | | +--rw target-output-power | | +--rw frequency | | : | +--ro state : : ``` ### Configuration example Configuration example for a mixed (heterogenous) multiplex application (described in "7.6.3 Network Path Applications") ```json { "modules": { "module": [ { "index": 0, "config": { "index": 0, }, "host-lanes": { "host-lane": [ { "index": 1, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G "data-path-id": 0, "network-path-id": 0, }, { "index": 2, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G "data-path-id": 0, "network-path-id": 0, }, { "index": 3, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G "data-path-id": 2, "network-path-id": 0, }, { "index": 4, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G "data-path-id": 2, "network-path-id": 0, }, { "index": 5, "app-sel-code": 2, # e.g. host rate 200G, 4 host lanes, media rate 400G "data-path-id": 4, "network-path-id": 0, }, { "index": 6, "app-sel-code": 2, # e.g. host rate 200G, 4 host lanes, media rate 400G "data-path-id": 4, "network-path-id": 0, }, { "index": 7, "app-sel-code": 2, # e.g. host rate 200G, 4 host lanes, media rate 400G "data-path-id": 4, "network-path-id": 0, }, { "index": 8, "app-sel-code": 2, # e.g. host rate 200G, 4 host lanes, media rate 400G "data-path-id": 4, "network-path-id": 0, }, ] }, "media-lanes": { "media-lane": [ { "index": 1, "config": { "target-output-power": 0.0, "frequency": 193100000, } } ] } } ] } } ```

Based on CMIS functional models

## Based on CMIS functional models ### Tree ```txt module: goldstone-transceiver +--rw modules +--rw module* [index] +--rw index -> ../config/index +--rw config | +--rw index | : +--ro state | : +--rw data-paths | +--rw data-path* [id] | +--rw id -> ../config/id | +--rw config | | +--rw id # DataPathID [0-7] - typically first host lane number minus one | | : | +--ro state | | : | | +--ro data-path-stete | | : | +--rw host-paths | | +--rw host-path* [first-lane] | | +--rw first-lane -> ../config/first-lane | | +--rw config | | | +--rw first-lane # The first host lane number of the Host Path | | | +--rw app-sel-code # MediaInterfaceID of the application must be the same for all host paths | | | : | | +--ro state | | : | +--rw network-path | | +--rw config | | | +--rw media-lanes # Associated media lanes e.g. [1] | | +--ro state | | | : | | +--rw media-lanes | | +-- media-lane* [index] | | +--rw index -> ../config/index | | +--rw config | | | +--rw index | | | +--rw target-output-power | | | +--rw frequency | | | : | | +--ro state : : : ``` ### Configuration example Configuration example for a mixed (heterogenous) multiplex application (described in "7.6.3 Network Path Applications") ```json { "modules": { "module": [ { "index": 0, "config": { "index": 0, }, "data-paths": { "data-path": [ { "id": 0, "config": { "id": 0, }, "host-paths": { "host-path": [ { "first-lane": 1, "config": { "first-lane": 1, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G } }, { "first-lane": 3, "config": { "first-lane": 3, "app-sel-code": 1, # e.g. host rate 100G, 2 host lanes, media rate 400G } }, { "first-lane": 5, "config": { "first-lane": 5, "app-sel-code": 2, # e.g. host rate 200G, 4 host lanes, media rate 400G } } ] }, "network-path": { "config": { "media-lanes": [1] }, "media-lanes": { "media-lane": [ { "index": 1, "config": { "index": 1, "target-output-power": 0.0, "frequency": 193100000, } } ] } } } ] } } ] } } ```

Other features

## Other features ```txt module: goldstone-transceiver +--rw modules +--rw module* [index] : +--ro applications # Application Descriptors | +--ro application* [code] | +--ro code -> ../state/code | +--ro state | +--ro code | +--ro host-interface-id | +--ro media-interface-id | +--ro host-lane-count | +--ro media-lane-count | +--ro host-lane-assignment-options | +--ro media-lane-assignment-options | +--ro np-application # ExtAppDescriptor +--ro monitors # VDM items (TODO) +--ro alarms # Alarms and warnings from Flags (TODO) ``` - Use modified `goldstone-component-connection.yang` to represents associations between components and transceiver modules.

ishidawataru commented 1 year ago

@noguchiko Can the second option represent the default empty configuration? (in the state container)

noguchiko commented 1 year ago

@ishidawataru

Can the second option represent the default empty configuration? (in the state container)

Yes, it can.

All state containers have the same nodes as config containers. The default configuraion (Active Control Set values) retrieved from the transceiver module will be represented as operational state in the state containers.

The current configuration of the Data Path structure can be retrieved from the Upper memory page 11h "Lane and Data Path Status" (for data-path and host-path) and the page 16h "Network Path Control and Status" (for network-path). The south layer software can construct the current configuration data by using them.

KazuyaAnazawa commented 1 year ago

@noguchiko Thank you for your proposal. I will take a look. Please give me some time.

ishidawataru commented 1 year ago

@noguchiko

The default configuraion (Active Control Set values) retrieved from the transceiver module will be represented as operational state in the state containers.

Does CMIS mandate that all host and network lanes be used for data paths? I thought if we took the second option, the user wouldn't be able to see unused host/network lanes.

( BTW I'm not saying that it's bad that we can't see the unused host/network lanes, just want to check the pros/cons of both options)

noguchiko commented 1 year ago

@ishidawataru

Does CMIS mandate that all host and network lanes be used for data paths?

No. Unused lanes are allowed. CMIS "6.2.3.2 Control Set Content" says:

The special AppSel code value 0000b in the Data Path Configuration register of a host lane indicates that the lane (together with its associated resources) is unused and not part of a Data Path.

I thought if we took the second option, the user wouldn't be able to see unused host/network lanes.

That's right.

want to check the pros/cons of both options

I think that the second option is easier for software (controllers and the south daemon) to handle because it has less redundant information.

ishidawataru commented 1 year ago

@noguchiko

I think that the second option is easier for software (controllers and the south daemon) to handle because it has less redundant information.

How does the ~~south~~ north daemon know how many unused lanes the module has? I'm wondering if the second option provides enough capability information to the ~~south~~ north daemon.

Also, could you show a configuration example of the second option where we have 2 data paths? Is it supported with the transceiver you're assuming?

noguchiko commented 1 year ago

@ishidawataru

How does the north daemon know how many unused lanes the module has? I'm wondering if the second option provides enough capability information to the north daemon.

The north daemons can know the lanes capability from the application advertisement. It is the list of applications supported by the module. The north daemon can calculate the available lanes in the module by using host|media-lane-count and host|media-lane-assignment-options of the advertised applications.

module: goldstone-transceiver
  +--rw modules
     +--rw module* [index]
        :
        +--ro applications
        |  +--ro application* [code]
        |     +--ro code -> ../state/code
        |     +--ro state
        |        +--ro code
        |        +--ro host-interface-id
        |        +--ro media-interface-id
        |        +--ro host-lane-count               # Number of host lanes
        |        +--ro media-lane-count              # Number of media lanes
        |        +--ro host-lane-assignment-options  # The lanes where the Application is supported on the module's host interface
        |        +--ro media-lane-assignment-options # The lanes where the Application is supported on the module's media interface
        |        +--ro np-application
        :

For example, host-lane-count == 4 and host-lane-assignment-options == 00010001b (lane 1 and 5), it represents that the host lanes 1 to 4 and host lanes 5 to 8 are available for the application, and it also means host lanes 1 to 8 is available in the module. Also, you can know used host lanes from app-sel-code (Application Selection Code, code in the applications) and first-lane (lowest numbered host lane of the data path) in the data-path. Then, you can find unused host lanes. The same goes for media lanes.

Also, could you show a configuration example of the second option where we have 2 data paths? Is it supported with the transceiver you're assuming?

No, the transceiver we are assuming doesn't support multiple data paths. However, the CMIS specification (6.2.1 Functional Module Capabilities - Applications) allows multiple data paths in a module. Therefore, I think it would be good for that the model can represent multiple data paths.

Configuration example

CMIS (6.2.1.3 Multiple Application Instances and Multiple Applications) says: _For example, a module may support one 400Gbps Application that is characterized by a 400GAUI-8 host interface and a 400GBASE-DR4 media interface combination, and a second 100Gbps Application that is characterized by a 100GAUI-2 host interface and a 100GBASE-DR media interface combination. The module may be programmable to work as one instance of the first Application or to work as up to four instances of the second Application._ In this context, an application instance means a data path. The following is the configuration example ("up to four instances of the second Application"): ```json { "modules": { "module": [ { "index": 0, "config": { "index": 0, }, "data-paths": { "data-path": [ { "id": 0, "config": { "id": 0, }, "host-paths": { "host-path": [ { "first-lane": 1, "config": { "first-lane": 1, "app-sel-code": 2, // the second application 100G to 100G } }, ] }, "network-path": { "media-lanes": { "media-lane": [ { "index": 1, "config": { "index": 1, "target-output-power": 0.0, "frequency": 193100000, } } ] } } } { "id": 2, "config": { "id": 2, }, "host-paths": { "host-path": [ { "first-lane": 3, "config": { "first-lane": 3, "app-sel-code": 2, // the second application 100G to 100G } }, ] }, "network-path": { "media-lanes": { "media-lane": [ { "index": 2, "config": { "index": 2, "target-output-power": 0.0, "frequency": 193100000, } } ] } } } { "id": 4, "config": { "id": 4, }, "host-paths": { "host-path": [ { "first-lane": 5, "config": { "first-lane": 5, "app-sel-code": 2, // the second application 100G to 100G } }, ] }, "network-path": { "media-lanes": { "media-lane": [ { "index": 3, "config": { "index": 3, "target-output-power": 0.0, "frequency": 193100000, } } ] } } } { "id": 6, "config": { "id": 6, }, "host-paths": { "host-path": [ { "first-lane": 7, "config": { "first-lane": 7, "app-sel-code": 2, // the second application 100G to 100G } }, ] }, "network-path": { "media-lanes": { "media-lane": [ { "index": 4, "config": { "index": 4, "target-output-power": 0.0, "frequency": 193100000, } } ] } } } ] } } ] } } ```

toru-mano commented 1 year ago

@noguchiko Minor comment: If I understand correctly, the values in data-path-id and network-path-id should be swapped in the configuration example of CMIS register representation.

KazuyaAnazawa commented 1 year ago

@noguchiko I took a look at your proposals. I think both models are ok because they essentially represent the same information, and xcvrd can be appropriately configured by south daemons subscribing them. Also, the basic capability information of transceiver can be represented by both models because AppSelCode is considered in the configuration and advertisement (state).

Then, my opinion is to take the second approach of goldstone-transceiver based on「6 Core Management Features」. This is because it's simpler, easier to understand, and less complex compared with the second approach, as you also pointed out.

It might be too trivial, but just one comment. There may be some attributes whose values/units should be same with goldstone-transponder. For example, the unit of frequency in goldstone-transceiver seems to be GHz while tx-laser-freq of goldstone-transponder is Hz.

noguchiko commented 1 year ago

@toru-mano Thank you for pointing it out, that is correct. I fixed it.

noguchiko commented 1 year ago

@KazuyaAnazawa Thank you for your comments.

There may be some attributes whose values/units should be same with goldstone-transponder. For example, the unit of frequency in goldstone-transceiver seems to be GHz while tx-laser-freq of goldstone-transponder is Hz.

If we do so, users can use goldstone-transceiver and goldstone-transponder in the same way, but they get less information from the goldstone-transceiver model. Changing the units of attributes can introduce misunderstandings (e.g. use Hz but actually GHz) and/or limitations (e.g. use GHz but actually Hz) about configuration resolution.

I think we should design the primitive model with the following point made by @ishidawataru:

The purpose of the primitive YANG model is to abstract each hardware component and not to provide a user-friendly and concise configuration model. We could have an additional YANG model (you may call this intermediate model) on top of the primitive models to provide user-friendliness if required.

Therefore, I think it is better to use CMIS-defined units for the goldstone-transceiver model.

@ishidawataru What do you think about this?

In relation to this, I noticed one thing. The frequency configuration is a combination of the following CMIS attributes:

GridSpacingTx<n> (GHz)
ChannelNumberTx<n>
FineTuningOffsetTx<n> (MHz, if fine-tuning is supported)

However, my first proposal (with a single config leaf node frequency) cannot represent this configuration manners and potential limitations well. I will improve this.

oopt-goldstone / goldstone-mgmt

Consider supporting coherent QSFP-DD (OpenZR+) module control #80

Summary of this issue.

Components for OpenZR+ module control

Issues

Main issue for this work

Another issue

Requirements & Implementation design (suggestion).

Requirements

Implementation direction (suggestion)

Tasks

Additional information