open-switch / opx-nas-interface

https://openswitch.net
2 stars 13 forks source link

SFP transceiver inside SFP+ port S4048 speed issue #54

Closed stevance closed 5 years ago

stevance commented 6 years ago

Hello,

Using OXP 3.0.0, I have an issue after a restart of a S4048, to have my SFP 1GbE operationnal state UP.

It is always down (while 'Administrative State: UP') , and I need to do a opx-ethtool -s dev speed 1000 to put it UP.

I don't see how to do a permanent change, by default it is said configure speed : auto

If I remove the transceiver SFP 1GbE, and put it back, the speed is correctly setup and the port can work.

Thank you

Eric

stevance commented 6 years ago

Hello,

Any advice concerning my issue?

Thank you

Eric

jeff-yin commented 6 years ago

@madhu222 -- what do you think? As a workaround for now, can @stevance perhaps put an opx-ethtool statement in the /etc/network/interfaces file to be run at boot time?

waliulislam commented 6 years ago

I use opx-ethtool statement to set autoneg or speed in /etc/network/interface file and that works.

waliulislam commented 6 years ago

'pre-up sudo /usr/bin/opx-ethtool -s e101-003-1 speed 1000' should work.

madhu222 commented 6 years ago

@stevance- You can configure in /etc/network/interfaces file to make it persistent

auto e101-001-0
   iface e101-001-0 inet manual
   sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000
stevance commented 6 years ago

Thank you everybody.

I will try and let you know.

Eric

stevance commented 6 years ago

Hello,

I have tried different solution, it did not worked.

I did under interfaces

auto e101-001-0 iface e101-001-0 inet manual up sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000

Or pre-up or post-up with same opx-ethtool command.

The result is the same no IP traffic go through that port.

If I redo the command "sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000", it worked.

It seems that the interface is not "active" during boot time to accept fully this command

However I have noticed this behaviour

With the up, pre-up or post-up command, if I run opx-ethtool e101-001-0

I have this information

Settings for e101-001-0: Channel ID: 0 Transceiver Status: Disable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821164 Qualified: Yes Administrative State: UP Operational State: DOWN Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : 0 Duplex : full

It seems that Configured Speed is correctly set to 1000 (it was auto otherwise)

But no IP traffic goes through, "Operational State: DOWN"

If I do again

sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000

and the command "opx-ethtool e101-001-0"

Settings for e101-001-0: Channel ID: 0 Transceiver Status: Enable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821164 Qualified: Yes Administrative State: UP Operational State: UP Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : False Duplex : full

The only difference is Operating Speed : False

But IP traffic goes through, "Operational State : UP"

What do you think?

Thank you,

Eric

jeff-yin commented 6 years ago

@waliulislam @madhu222 -- please provide further advice @stevance -- Now I'm suspecting you might be running into a timing issue. We have made some package updates since the OPX 3.0.0 installer was built. If you haven't already done so recently, on your switch, please run apt-get update; apt-get dist-upgrade; reboot. The updated packages have revised service dependencies which may help resolve issues with timing and interface creation.

stevance commented 6 years ago

@jeff-yin I tried but it did not change anything.

After a reboot, as soon as I can log in the OPX, the interface e101-001-0 for instance is not mounted for instance I got this message :

root@OPX:/home/admin# opx-ethtool e101-001-0 Wrong interface name or interface not present.

It takes around one minute for the interface to answer, but does not become Operational State UP, except if I run manually opx-ethtool -s e101-001-0 speed 1000

root@OPX:/home/admin# opx-ethtool e101-001-0 Settings for e101-001-0: Channel ID: 0 Transceiver Status: Disable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821163 Qualified: Yes Administrative State: UP Operational State: DOWN Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : 0 Duplex : full

Thank you for your return

Eric

jeff-yin commented 6 years ago

@GarrickHe is working on reproducing this issue locally in the lab with an S4048 + 1G SFP

stevance commented 6 years ago

Let me know if you need more details from my side. Thank you

GarrickHe commented 6 years ago

@stevance

A few questions for you:

  1. When you check the interface status with opx-ethtool e101-001-0, was the system in running state? You need to ensure the system is in running state before doing anything. You can check via opx-show-system-status.

  2. What configuration are you using for the peer side? Is it using the same SFP or a different one? I was able to reproduce this issue if one side used a copper SFP on a fiber port and the other side is a copper port (RJ45). In this case I had to disable autoneg on the fiber port side.

Thanks, Garrick

stevance commented 6 years ago

@GarrickHe

  1. Yes ops-show-system-status shows running state

here is the details :


root@OPX:/home/admin# opx-show-system-status System State: running No Failed Service Modified Packages: opx-platform-config-dell-s4048 root@OPX:/home/admin# root@OPX:/home/admin# opx-ethtool e101-001-0 Settings for e101-001-0: Channel ID: 0 Transceiver Status: Disable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821163 Qualified: Yes Administrative State: UP Operational State: DOWN Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : 0 Duplex : full

As long as I do not run opx-ethtool -s e101-001-0 speed 1000, the Operational State is down.


root@OPX:/home/admin# opx-ethtool e101-001-0 Settings for e101-001-0: Channel ID: 0 Transceiver Status: Enable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821163 Qualified: Yes Administrative State: UP Operational State: UP Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : False Duplex : full


And I can ping this interface.

  1. The peer side is on a 4048-ON running OS9 with the same type of copper SFP module recognised.

Hope it helps you

The SFP modules come from Enterasource said to be Dell compatible. I tried also another source from BBOS Dell compatible. I have also tried a coper SFP+ 10G.

The modules are ok either while putting the speed or by removing and putting back the module in the device.

Thank you,

Eric

GarrickHe commented 6 years ago

Hi @stevance

Sorry for the delay. Can you check your opx-create-interface.service and see if you got this change:

+++ b/scripts/init/opx-create-interface.service
@@ -9,7 +9,7 @@ Type=oneshot
 RemainAfterExit=yes
 EnvironmentFile=/etc/opx/opx-environment
 ExecStart=/usr/bin/python -u /usr/bin/base_nas_create_interface.py
-ExecStartPost=/usr/bin/python -u /usr/bin/base_nas_fanout_init.py && "/bin/sh -c /usr/bin/network_restart.sh"  <<<--- DELETE THIS 
+ExecStartPost=/bin/sh -c "/usr/bin/python -u /usr/bin/base_nas_fanout_init.py && /usr/bin/network_restart.sh"  <<<--- ADD THIS (without the '+' character)
 TimeoutStartSec=360

Also in /usr/bin/network_restart should look like this:

/sbin/ifdown -a --exclude=lo  <<--- add this new line
service networking restart

If you don't, please go ahead and apply it. Reboot your box and see if the issue shows up. let me know how it goes. if it didn't work please share your opx-create-interface.service file BEFORE you made the changes mentioned above.

Thanks, Garrick

stevance commented 6 years ago

Hello @GarrickHe

Sorry for the delay, I could check the configuration file and it was different from what you suggest.

I have done the modification on the 2 files, but it did not change anything.

Here is what I had before

root@OPX:/lib/systemd/system# cat opx-create-interface.service [Unit] Description=This service is to create all interface during system initiation ConditionPathExists=!/etc/opx/nas_if_nocreate After=opx-cps.service opx-nas.service opx-front-panel-ports.service Requires=opx-cps.service opx-nas.service opx-front-panel-ports.service

[Service] Type=oneshot RemainAfterExit=yes EnvironmentFile=/etc/opx/opx-environment ExecStart=/usr/bin/python -u /usr/bin/base_nas_create_interface.py ExecStartPost=/bin/sh -c "/bin/sh -c /usr/bin/base_nas_fanout_init.sh && /usr/bin/network_restart.sh" TimeoutStartSec=90

[Install] WantedBy=multi-user.target

root@OPX:/usr/bin# cat network_restart.sh

!/bin/sh

Restart networking service without shuting down

interfaces that are already up

service networking restart

Let me know,

Eric

GarrickHe commented 6 years ago

@stevance

Here is what the content should look like. Your files should match the following.

/usr/bin/network_restart.sh

#!/bin/sh

# Bring down all interfaces (except system loopback) before restarting Networking service
# to ensure that all interface settings gets programmed into the NPU

/sbin/ifdown -a --exclude=lo
service networking restart

/lib/systemd/system/opx-create-interface.service

[Unit]
Description=This service is to create all interface during system initiation
ConditionPathExists=!/etc/opx/nas_if_nocreate
After=opx-cps.service opx-nas.service
Requires=opx-cps.service opx-nas.service

[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/opx/opx-environment
ExecStart=/usr/bin/python -u /usr/bin/base_nas_create_interface.py
ExecStartPost=/bin/sh -c "/usr/bin/python -u /usr/bin/base_nas_fanout_init.py && /usr/bin/network_restart.sh"
TimeoutStartSec=360

[Install]
WantedBy=multi-user.target

You can also take a look at these links for reference: https://github.com/open-switch/opx-nas-interface/blob/master/scripts/bin/network_restart.sh https://github.com/open-switch/opx-nas-interface/blob/master/scripts/init/opx-create-interface.service

After you updated your file, please restart your system with the 'reload' command and see if problem persist.

hope this helps

stevance commented 6 years ago

@GarrickHe

It is exactly what I have now on the system, it is still not working.

At least as I type opx-ethtool -s e101-001-0 speed 1000 for instance the SFP module starts answering on the port.

Eric

GarrickHe commented 6 years ago

In your /etc/network/interfaces file you got something like this (i'm assuming you're using e101-001-0):

auto e101-001-0
   iface e101-001-0 inet manual
   pre-up sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000
stevance commented 6 years ago

Yes it is was I have under /etc/network/interfaces.d and the file e101-001-0 with the same content as you wrote

It does not work more.

Eric

GarrickHe commented 5 years ago

@stevance

When it is in working condition (oper/admin are both up) and you execute opx-ethtool e101-001-0 does it show autoneg as on or off?

thanks, Garrick

stevance commented 5 years ago

@GarrickHe

After the opx-ethtool -s e101-001-0 speed 1000, the autoneg is off

Here is the result the full cli command showing after a reboot and after opx-ethtool e101-001-0

root@OPX:/home/admin# opx-ethtool e101-001-0 Settings for e101-001-0: Channel ID: 0 Transceiver Status: Disable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821169 Qualified: Yes Administrative State: UP Operational State: DOWN Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : auto Operating Speed : 0 Duplex : full root@OPX:/home/admin# opx-ethtool -s e101-001-0 speed 1000 speed 1000 root@OPX:/home/admin# ping 10.0.1.200 PING 10.0.1.200 (10.0.1.200) 56(84) bytes of data. 64 bytes from 10.0.1.200: icmp_seq=1 ttl=64 time=2009 ms 64 bytes from 10.0.1.200: icmp_seq=2 ttl=64 time=993 ms 64 bytes from 10.0.1.200: icmp_seq=3 ttl=64 time=1.08 ms ^C --- 10.0.1.200 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2014ms rtt min/avg/max/mdev = 1.089/1001.534/2009.618/819.996 ms, pipe 2 root@OPX:/home/admin# opx-ethtool e101-001-0 Settings for e101-001-0: Channel ID: 0 Transceiver Status: Enable Media Type: Unknown Part Number: 310-7225-ESI Serial Number: ZZ180821169 Qualified: Yes Administrative State: UP Operational State: UP Supported Speed (in Mbps): [1000, 10000] Auto Negotiation : off Configured Speed : 1000 Operating Speed : False Duplex : full

Hope it helps

Thank you,

Eric

GarrickHe commented 5 years ago

@stevance I have a setup like this:

[s4048]<---->[s3048]
 both connected via e101-001-0

output on s3048 after bootup:

root@OPX:~# opx-ethtool e101-001-0
Settings for e101-001-0:
    Media Type: 1000BASE-T-RJ45
    Part Number:
    Serial Number:
    Qualified: Yes
    Administrative State: UP
    Operational State: UP
    Supported Speed (in Mbps):  [10, 100, 1000]
    Auto Negotiation : on                    <<===== autoneg on by default
    Configured Speed   : 1000
    Operating Speed   : False
    Duplex   : full
root@OPX:~#

output for s4048 after bootup:

root@OPX:~# opx-ethtool e101-001-0
Settings for e101-001-0:
    Channel ID:   0
    Transceiver Status: Enable
    Media Type: SFP 1000BASE-T
    Part Number: FCLF-8521-3
    Serial Number: PQM1PH0
    Qualified: Yes
    Administrative State: UP
    Operational State: UP
    Supported Speed (in Mbps):  [1000, 10000]
    Auto Negotiation : off               <<=== autoneg off by default
    Configured Speed   : 1000
    Operating Speed   : False
    Duplex   : full
root@OPX:~#

I'm not sure how your peer works but maybe turn on autoneg on the peer side if autoneg is off on your s4048. I do notice if you 'force' autoneg off (or 'on') in both sides then the oper-state will stay 'down'.

s3048 /etc/network/interface:

root@OPX:~# cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

auto e101-001-0
  iface e101-001-0 inet manual
  sudo pre-up sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000

s4048 /etc/network/interface:

root@OPX:~# cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

auto e101-001-0
  iface e101-001-0 inet manual
  pre-up sudo /usr/bin/opx-ethtool -s e101-001-0 speed 1000

Let me know how it goes.

Thanks, Garrick

stevance commented 5 years ago

Hello @GarrickHe

I have done what you wrote and it does not work.

Only SFP+ copper seem to be working fine.

What is your SFP copper module that you are using?

Thank you,

Eric

atanu-mandal commented 5 years ago

Hi Eric, Garrick should be able to follow up after OPX 3.1.0 release (upcoming).

GarrickHe commented 5 years ago

@stevance ,

As I mentioned above in the s4048 output:

root@OPX:~# opx-ethtool e101-001-0
Settings for e101-001-0:
    Channel ID:   0
    Transceiver Status: Enable
    Media Type: SFP 1000BASE-T
    Part Number: FCLF-8521-3
    Serial Number: PQM1PH0

I'm not sure why yours says 'unknown' media type though.

Your output from above post:

Settings for e101-001-0:
Channel ID: 0
Transceiver Status: Disable
Media Type: Unknown      <<<==== unknown?
Part Number: 310-7225-ESI
Serial Number: ZZ180821163

-garrick

stevance commented 5 years ago

@GarrickHe

I was surprised also by this "unknown" media type. I had a similar behaviour with another SFP module while media type was describe.

I am out of my office, next week I will do the upgrade to 3.1.0 and update you.

Thank you,

Eric

GarrickHe commented 5 years ago

@stevance ,

Any updates?

Thanks, Garrick

stevance commented 5 years ago

Garrick,

Currently travelling, otherwise no more news. I was expecting to see an upgrade of the OPX to new version. Isn’t it what was said?

I can do more in 10 days on the device.

Thank you,

Best regards,

Eric

Le 3 janv. 2019 à 20:09, Garrick He notifications@github.com a écrit :

@stevance ,

Any updates?

Thanks, Garrick

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

GarrickHe commented 5 years ago

@stevance No problem. I was just checking in because your last comment was 22-days ago and it said you were traveling for the week. No rush. Keep me posted.

Thanks, Garrick

jeff-yin commented 5 years ago

OPX release 3.1.0 is indeed released. http://archive.openswitch.net/installers/3.1.0/Dell-EMC/PKGS_OPX-3.1.0-installer-x86_64.bin

If you’re upgrading to 3.1.0 (rather than doing an install from ONIE), please run the following:

echo "
Package: *
Pin: origin deb.openswitch.net
Pin-Priority: 750" | sudo tee -a /etc/apt/preferences

sudo apt-get update && sudo apt-get dist-upgrade
sudo reload
stevance commented 5 years ago

Hello,

Thank you for the update.

I will do it when I am back and let you know

Best regards,

Eric

Le 4 janv. 2019 à 17:48, jeff-yin notifications@github.com a écrit :

OPX release 3.1.0 is indeed released. http://archive.openswitch.net/installers/3.1.0/Dell-EMC/PKGS_OPX-3.1.0-installer-x86_64.bin

If you’re upgrading to 3.1.0 (rather than doing an install from ONIE), please run the following:

echo " Package: * Pin: origin deb.openswitch.net Pin-Priority: 750" | sudo tee -a /etc/apt/preferences

sudo apt-get update && sudo apt-get dist-upgrade sudo reload

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

stevance commented 5 years ago

Hello all,

I have updated to OPX to 3.1.0, in fact, a full new install, and all the tests are now working even after a reboot.

I only need to create the interface file in the /etc/network/interfaces.d to fix the speed to 1000 for Gbps SFP otherwise it works up to now.

Thank you for all the support.

Eric

GarrickHe commented 5 years ago

@stevance

No problem. Glad the new version took care of it. I'll close this issue.

THanks, Garrick