Open mkeeter opened 1 year ago
Additionally, we could also just remove modules from the loop which do not support temperature monitoring anyway. Our current logic just reads the relevant bytes from the module without taking into account if those are valid or not.
For SFF-8636 we would need to qualify our read of the free side temp monitors (lower page bytes 22/23) with if that monitoring is actually supported (upper page 0 byte 220 bit 5).
For CMIS we need to qualify our read of the temperature monitor (lower page bytes 14/15) with if that monitoring is actually supported (upper page 1 byte 159 bit 0).
If we can get some of the misbehaving transceivers into a bench Sidecar, it should be pretty easy to test this out.
Looks like we have many options on niles
! Anywhere xcvradm
marks a field with --
that indicates the field is not supported on that module.
aaron@niles ~ $ ./xcvradm -i axf7 -t present vendor-info
Port Identifier Vendor Part Rev Serial Mfg date
0 Qsfp28 (0x11) FS QSFP28-SR4-100G 04 G2130484857 20220321
2 QsfpPlusCmis (0x1e) Intel Corp SPTSMP3CLCDA 03 CRFR2141020JP 21101500
3 QsfpPlusCmis (0x1e) Intel Corp SPTSMP3CLCDA 03 CRFR213905JEP 21101800
4 QsfpPlusCmis (0x1e) FINISAR CORP. FTCC1112E2PCL A X65BPQR 210901
5 Qsfp28 (0x11) FS QSFP28-SR4-100G 1A F2220590150 220615
8 QsfpPlusCmis (0x1e) FINISAR CORP. FTCC1112E2PCL A X6QA1JC 220305
16 Qsfp28 (0x11) FS QSFP28-SR4-100G 04 G2130484856 20220321
24 Qsfp28 (0x11) Intel Corp AMQ28-SR4 01 IN100MC0040 221206
aaron@niles ~ $ ./xcvradm -i axf7 -t present monitors
Port 0
Temperature (C): --
Supply voltage (V): --
Avg Rx power (mW): [0.6940,0.5875,0.6929,0.5592]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001]
Aux 1: --
Aux 2: --
Aux 3: --
Port 2
Temperature (C): 30.992188
Supply voltage (V): 3.3943
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Aux 1: --
Aux 2: --
Aux 3: --
Port 3
Temperature (C): 29.847656
Supply voltage (V): 3.4041
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Aux 1: --
Aux 2: --
Aux 3: --
Port 4
Temperature (C): 28.601563
Supply voltage (V): 3.3572998
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Aux 1: --
Aux 2: --
Aux 3: --
Port 5
Temperature (C): --
Supply voltage (V): --
Avg Rx power (mW): [0.0000,0.0000,0.0000,0.0000]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001]
Aux 1: --
Aux 2: --
Aux 3: --
Port 8
Temperature (C): 27.527344
Supply voltage (V): 3.3665
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001,0.0000,0.0000,0.0000,0.0000]
Aux 1: --
Aux 2: --
Aux 3: --
Port 16
Temperature (C): --
Supply voltage (V): --
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001]
Tx bias (mA): [0.0000,0.0000,0.0000,0.0000]
Tx power (mW): [0.0001,0.0001,0.0001,0.0001]
Aux 1: --
Aux 2: --
Aux 3: --
Port 24
Temperature (C): --
Supply voltage (V): --
Avg Rx power (mW): [0.0001,0.0001,0.0001,0.0001]
Tx bias (mA): [5.7040,5.6980,5.6780,5.7180]
Tx power (mW): [0.9602,0.8259,1.0517,1.0618]
Aux 1: --
Aux 2: --
Aux 3: --
There's some weirdness going on here; notice that all of the SFF-8636 transceivers aren't reporting temperature, and all of the CMIS transceivers are!
For the SFF-8636 transceivers, xcvradm
is looking at upper page 0, byte 220, bit 5 (per the spec)
All of our transceivers are reporting 0x0c
:
matt@niles ~ () $ ./xcvradm -i axf7 -t0,5,16,24 read-upper --page 0 --sff 220 1
Port Data
0 [0x0c]
5 [0x0c]
16 [0x0c]
24 [0x0c]
Bit 5 is not set, so they are claiming to not support temperature readings.
However, they all also provide perfectly valid temperature values:
matt@niles ~ () $ ./xcvradm -i axf7 -t0,5,16,24 read-lower --sff 22 2
Port Data
0 [0x1b,0x66] # 27.37°C
5 [0x18,0x7b] # 24.48°C
16 [0x1a,0x48] # 26.28°C
24 [0x1e,0x5f] # 30.37°C
I'm a little mystified here. Do we have any SFF-8363 transceivers that claim to support temperature monitoring?
This table would make me thing that temp at least is required for SM-type devices which is what all of our non-dac, non-active-optical modules are.
Maybe they're "pre-rev 2.8?" or maybe the monitoring is referring to some other features like over temp alert kinds of things?
Maybe they're "pre-rev 2.8?" or maybe the monitoring is referring to some other features like over temp alert kinds of things?
I checked the version theory earlier, and all but 1 of them are returning a version number that means rev 2.8, 2.9, 2.10:
matt@niles ~ () $ ./xcvradm -i axf7 -t0,5,16,24 read-lower --sff 1 1
Port Data
0 [0x08]
5 [0x08]
16 [0x08]
24 [0x07]
Good catch on §6.2.4. It sure looks like temperature monitoring should be required for SM
modules; it's just unfortunate that the diagnostic monitor bitfield doesn't reflect that...
@mkeeter the sus module I ordered finally arrived and is now installed in port 13 on the niles
sidecar. Not urgent, just updating the ticket for future us.
quoth @kc8apf