sonic-net / sonic-platform-daemons

Platform module daemons for SONiC
Other
25 stars 159 forks source link

Add third generic media tuning key to search for (medium+lane) #538

Open bobbymcgonigle opened 2 months ago

bobbymcgonigle commented 2 months ago

Description

This change searches for a third generic key that covers medium type + lane speed. This will cover all Arista usecases as at Arista out tuning values are based off medium type + lane speed. Of course this will work for all vendors if they choose to do so. The consequence of this is that on bringup Arista can upload a single media_settings.json which will cover every possible speed and lane combination. Any users of our SKUs will not need to ever ask for updates to cover new optics, speeds etc.

Note if there is no xcvr present we will assume copper; this will cover internal phy to external phy connection, as well as any backplane links.

The new flow will look like: (Addition to current flow in green) medium-speed-key

Motivation and Context

The current implementation of the the dynamic media tuning feature requires that you know either: 1) Vendor Key E.g. AMPHENOL-1234 2) Media Key E.g. QSFP28-40GBASE-CR4-1M

This is too specific. For example consider the following case. I am using QSFP-DDs, but last minute before deployment I change vendors and use QSFPs - now I need the vendor like Arista to upstream a new PR that includes different or even the same tuning values but with a new vendor/media key to match against. Same scenario if we change from DACs to optics last minute.

We can make this more generic by adding a third key that is a combination of physical medium (copper or optical) and lane speed (E.g. 400g-8 is 50000, 100g-4 is 25000).

This key can be easily derived in xcvrd because 1) cmis spec can tell us if it is copper or optical 2) We can easily derive the lane speed from config

This means that we can now have 1 media_settings.json file that can cover all scenarios. It does the check after Vendor+Media key - so you can still have very specific values if you want to take precedence over the generic.

How Has This Been Tested?

Added a new media_settings.json for Arista QuicksilverP with the following entries per port:

"PORT_MEDIA_SETTINGS": {
    "1": {
        "COPPER25000": {
            "main": {
                "lane0": "0x6b",
                "lane1": "0x6b",
                "lane2": "0x6b",
                "lane3": "0x6b",
                "lane4": "0x6b",
                "lane5": "0x6b",
                "lane6": "0x6b",
                "lane7": "0x6b"
               ...
             },
             "COPPER50000": {
                ...
             }
              "OPTICAL100000": {
                ...
             }
              "OPTICAL25000": {
               ...
             }

Changing speeds and doing xcvr swap I can see that the different and correct values are applied in APPL_DB.

I also added test cases and made sure other testcases pass tests/test_xcvrd.py unit tests.

Additional Information (Optional)

prgeor commented 2 months ago

@bobbymcgonigle can you review this PR if it meets the requirement? https://github.com/sonic-net/sonic-platform-daemons/pull/533/files

prgeor commented 2 months ago

@bobbymcgonigle there is a regex match for media key here:- https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-xcvrd/xcvrd/xcvrd_utilities/media_settings_parser.py#L183

If you specify your SI settings as below:-

   "(QSFP*-CR*)|*copper*": { ==> This should take care of any DAC that is CMIS OR non-CMIS 
              "speed:400*": {
                  "idriver": {
                      "lane0": "0x0000003c",
                      "lane1": "0x0000003c",
                      "lane2": "0x0000003c",
                      "lane3": "0x0000003c",
                      "lane4": "0x0000003c",
                      "lane5": "0x0000003c",
                      "lane6": "0x0000003c",
                      "lane7": "0x0000003c"
                  },
       }
       "Default": { ==> This should take care of non-DAC
              "speed:400*": {
                  "idriver": {
                      "lane0": "0x0000003c",
                      "lane1": "0x0000003c",
                      "lane2": "0x0000003c",
                      "lane3": "0x0000003c",
                      "lane4": "0x0000003c",

Basically the media type is a regular expression it just how you represent that regular expression your platform media_settings.json

bobbymcgonigle commented 2 months ago

@bobbymcgonigle there is a regex match for media key here:- https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-xcvrd/xcvrd/xcvrd_utilities/media_settings_parser.py#L183

If you specify your SI settings as below:-

   "(QSFP*-CR*)|*copper*": { ==> This should take care of any DAC that is CMIS OR non-CMIS 
              "speed:400*": {
                  "idriver": {
                      "lane0": "0x0000003c",
                      "lane1": "0x0000003c",
                      "lane2": "0x0000003c",
                      "lane3": "0x0000003c",
                      "lane4": "0x0000003c",
                      "lane5": "0x0000003c",
                      "lane6": "0x0000003c",
                      "lane7": "0x0000003c"
                  },
       }
       "Default": { ==> This should take care of non-DAC
              "speed:400*": {
                  "idriver": {
                      "lane0": "0x0000003c",
                      "lane1": "0x0000003c",
                      "lane2": "0x0000003c",
                      "lane3": "0x0000003c",
                      "lane4": "0x0000003c",

Basically the media type is a regular expression it just how you represent that regular expression your platform media_settings.json

This does not cover all usecases. I think it's important to distinguish between media type and medium. Probably best explained with this example I have on one of my devices here.

Let's print the keys searched without my change:

{'vendor_key': 'FINISAR CORP    -FCBN950QE1C04   ', 'media_key': 'QSFP28-Unknown-4.0M', 'lane_speed_key': None}
{'vendor_key': 'ARISTA NETWORKS -CAB-Q-Q-100G-1M ', 'media_key': 'QSFP28-Unknown-1.0M', 'lane_speed_key': None}

How can we tell here if these are copper or optic ? They both have the same media_key (aside from length).

with this change:

{'vendor_key': 'FINISAR CORP    -FCBN950QE1C04   ', 'media_key': 'QSFP28-Unknown-4.0M', 'lane_speed_key': None, 'medium_lane_speed_key': 'OPTICAL50000'}
{'vendor_key': 'ARISTA NETWORKS -CAB-Q-Q-100G-1M ', 'media_key': 'QSFP28-Unknown-1.0M', 'lane_speed_key': None, 'medium_lane_speed_key': 'COPPER50000'}

We can easily apply the correct tunings now that we know one is optical and one is copper. This also future proofs any new specs that come along; we don't have to maintain a constant list of form factor types, media types etc. Without this change we wouldn't be able to tell which are the correct tunings to apply without knowing all part numbers.

prgeor commented 1 month ago

@bobbymcgonigle The unknown in the media key seems to indicate this module is not advertising the media type properly. Can you share the output of sfutil show eeprom -d -p EthernetXX for both these modules :-

  1. QSFP28-Unknown-4.0M
  2. QSFP28-Unknown-1.0M
prgeor commented 1 month ago

@bobbymcgonigle can you explain what you mean by media type and medium?

"distinguish between media type and medium"

bobbymcgonigle commented 1 month ago

@bobbymcgonigle can you explain what you mean by media type and medium?

"distinguish between media type and medium"

Medium is Copper or Fiber Media type from SONiC pov is 400GBASE-CR4 or 400GBASE-CR8 or 200GBASE-CR4. If any of these xcvrs were used on an interface at 200G-4 they would all use the COPPER50000 medium type even though their media type is different.

Really we don't need to support all these different media types when the parent characteristic (medium) covers them all.

prgeor commented 1 month ago

@bobbymcgonigle can this closed as we discussed we don't need this change?

bobbymcgonigle commented 1 month ago

@bobbymcgonigle can this closed as we discussed we don't need this change?

Hi Prince, took some time to figure out all edge cases we've missed. I've discovered a couple that would make BASE-CR not work like we expected. I've described it all out in a full proposal for the third key; I'll email this document today!