open-switch / opx-cps

https://openswitch.net
6 stars 15 forks source link

Creating LAG interface without a number in its name causes opx_nas_daemon to crash #74

Closed james-jra closed 6 years ago

james-jra commented 6 years ago

Creating a LAG interface with a name that does not end in an integer fails and causes opx_nas_daemon to crash.

Note: The same behaviour is true even if we provide a LAG ID using base-if-lag/if/interfaces/interface/id

Expected behaviour:

Reproduction

Create a LAG interface with a number at the end (succeeds):

opxUser@opx221_vm:~$ sudo cps_set_oid.py -qua target -oper action dell-base-if-cmn/set-interface dell-base-if-cmn/set-interface/input/operation=1 if/interfaces/interface/type=ianaift:ieee8023adLag if/interfaces/interface/name=testlag13
Success
Key: 1.19.1245192.
dell-base-if-cmn/if/interfaces/interface/if-index = 40
dell-if/if/interfaces/interface/phys-address = 08:00:27:2d:79:06
base-if-lag/if/interfaces/interface/id = 13
if/interfaces/interface/name = testlag13
if/interfaces/interface/type = ianaift:ieee8023adLag
cps/object-group/return-code = 0
dell-base-if-cmn/set-interface/input/operation = 1

Show opx_nas_daemon still up:

opxUser@opx221_vm:~$ ps -ef | grep nas
root       544     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base-nas-shell.py
root       548     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_monitor_phy_media.py
root       554     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_phy_media_config.py
root       555     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_front_panel_ports.py
root       756     1  0 10:33 ?        00:00:05 /usr/bin/opx_nas_daemon
opxUser   1930  1869  0 10:38 pts/1    00:00:00 grep nas

Create a LAG interface without a number in the name (fails):

opxUser@opx221_vm:~$ sudo cps_set_oid.py -qua target -oper action dell-base-if-cmn/set-interface dell-base-if-cmn/set-interface/input/operation=1 if/interfaces/interface/type=ianaift:ieee8023adLag if/interfaces/interface/name=testlag
opxUser@opx221_vm:~$ 
# No output indicates failure in cps_set_oid.py

Show opx_nas_daemon is down:

opxUser@opx221_vm:~$ ps -ef | grep nas
root       544     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base-nas-shell.py
root       548     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_monitor_phy_media.py
root       554     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_phy_media_config.py
root       555     1  0 10:33 ?        00:00:00 /usr/bin/python -u /usr/bin/base_nas_front_panel_ports.py

Diags

The following is the relevant extract from syslog:

Mar  9 10:35:14 opx221_vm opx_nas_daemon[716]: [INTERFACE:NAS-CPS-LAG], Can't get interface control information for testlag
Mar  9 10:35:14 opx221_vm opx_nas_daemon[716]: terminate called after throwing an instance of 'std::out_of_range'
Mar  9 10:35:14 opx221_vm opx_nas_daemon[716]: what():  basic_string::substr: __pos (which is 18446744073709551615) > this->size() (which is 7)
Mar  9 10:35:14 opx221_vm kernel: [  112.032613] bonding: testlag: Setting MII monitoring interval to 100
Mar  9 10:35:14 opx221_vm python[544]: [DSAPI:CPS IPC], Failed to read the receive header
Mar  9 10:35:14 opx221_vm python[544]: [DSAPI:COMMIT], Failed to commit request at 0 out of 1
Mar  9 10:35:14 opx221_vm python[544]: Failed to commit request
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for observed dell-base-if-cmn/if/interfaces-state/interface
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target dell-base-if-cmn/dell-if/clear-counters/input
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-if-phy/physical
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target dell-base-if-cmn/dell-if/clear-eee-counters/input
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target dell-base-if-cmn/if/interfaces/interface
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target ni/network-instances
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target ni/if/interfaces/interface
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for observed base-switch/switching-entities/switching-entity
Mar  9 10:35:14 opx221_vm python[1775]: [DSAPI:COMMIT], Failed to commit request at 0 out of 1
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-switch/switching-entities
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-sflow/entry
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-switch/set_log
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-traffic-hash/entry
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-mirror/entry
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-switch/switching-entities/switching-entity
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for observed dell-base-if-cmn/if/interfaces-state/interface/statistics
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-stg/default-stg
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-stg/entry
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-stg/entry/vlan
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-mac/query
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-mac/table
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-mac/flush
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-mac/flush-management
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-l2-mcast/cleanup-l2mc-member
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-neighbor/if/interfaces-state/interface
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-neighbor/base-route/obj/nbr
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/fib
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/peer-routing-config
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/obj/entry
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/route-nh-operation
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/ip-unreachables-config
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/obj/nbr
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/flush
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/interface-mode-change
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/obj
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-route/nh-track
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-packet/rule
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-sflow/socket-address
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-acl/clear-acl-entries-for-nh
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-udf
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-acl
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-qos
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed for target base-if-linux/if/interfaces/interface
Mar  9 10:35:14 opx221_vm systemd[1]: opx-nas.service: main process exited, code=killed, status=6/ABRT
Mar  9 10:35:14 opx221_vm systemd[1]: Unit opx-nas.service entered failed state.
Mar  9 10:35:14 opx221_vm systemd[1]: Triggering OnFailure= dependencies of opx-nas.service.
Mar  9 10:35:14 opx221_vm systemd[1]: Failed to enqueue OnFailure= job: Invalid argument
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed 1.4.1.
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed 1.3.
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed 1.4.2.
Mar  9 10:35:14 opx221_vm opx_cps_service[528]: [DSAPI:NS], Added registration removed 1.4.3.
Mar  9 10:35:19 opx221_vm opx_pas_service[542]: [DSAPI:CPS IPC], not able to connect to owner
Mar  9 10:35:19 opx221_vm opx_pas_service[542]: [DSAPI:NS], Failed to find owner for 1.36.2359341.2359299.2359302.
Mar  9 10:35:19 opx221_vm opx_pas_service[542]: [PAS:dn_remote_temp_sensor_poll], CPS API get failed
Mar  9 10:35:19 opx221_vm opx_pas_service[542]: [PAS:dn_pas_remote_poller_thread], Poll cycle failed
Mar  9 10:35:19 opx221_vm python[545]: Entity ('card', 1) sensor "NPU temp sensor" faulty, oper-status=8, fault-type=3
Mar  9 10:35:24 opx221_vm opx_pas_service[542]: [DSAPI:NS], Failed to find owner for 1.36.2359341.2359299.2359302.
Mar  9 10:35:24 opx221_vm opx_pas_service[542]: [PAS:dn_remote_temp_sensor_poll], CPS API get failed
Mar  9 10:35:24 opx221_vm opx_pas_service[542]: [PAS:dn_pas_remote_poller_thread], Poll cycle failed
GarrickHe commented 6 years ago

Thanks. I am looking into this issue.

GarrickHe commented 6 years ago

@james-jra

I have found the root-cause and will implement a fix. Will let you know when it is pushed. The fix will be in opx-nas-interface, not opx-cps.

Thanks, Garrick

GarrickHe commented 6 years ago

@james-jra Update, I will push the changes soon. Been busy with other issues but I haven't forgotten you =)

GarrickHe commented 6 years ago

@james-jra Issue is fixed with this push: https://review.openswitch.net/#/c/14731/1