Open vnam1 opened 6 years ago
@vnam1 I'll take a look. Can you please tell me what version of OPX you're running? Also show the output of 'opx-show-system-status'.
Thank you, Garrick
root@leaf-2:/var/log# opx-show-version OS_NAME="OPX" OS_VERSION=2.3.0 PLATFORM="S6000-VM" ARCHITECTURE="x86_64" INTERNAL_BUILD_ID="OpenSwitch blueprint for Dell 1.0.0" BUILD_VERSION=0 BUILD_DATE=2018-03-27T22:44:29+0000 INSTALL_DATE=2018-03-30T20:30:49+0000 SYSTEM_UPTIME= 3 hours, 44 minutes SYSTEM_STATE= degraded UPGRADED_PACKAGES=no ALTERED_PACKAGES=no
Please ignore vboxadd.service below since we use KVM hypervisor.
root@leaf-2:/var/log# opx-show-system-status System State: degraded Failed Services vboxadd-service.service vboxadd.service No Modified Package root@leaf-2:/var/log#
@vnam1 I got a few questions.
cps_config_vlan.py --add --id 10 --vlantype 1 --port e101-003-0
[OPX]-----[vswitch]-------[Ubuntu]
if not, how is OPX connect to Ubuntu?I don't have KVM so I'm trying to reproduce this on ESXi.
@GarrickHe Yes the config/connectivity you mentioned is correct. This is reproducible always
Related to this? https://github.com/open-switch/opx-sai-vm/issues/15
This should have nothing to do with the configuration. I doubt that the logs are relevant either.
It may have something to do with the connectivity from the server to OPX VM. Would it be possible - for instance - that you have two identical IP addresses on the same L2 segment (as OPX VM and server) ? What is your setup in KVM ? How do you interconnect VM to the server ?
Does the problem happen consistently?
Please keep in mind that effectively this is communications between two Linux stacks - on OPX VM and the Linux server - so there's no reason obvious reason for lost packets. BTW, if you remove the bridge and manually assign the same IP address to e101-003-0, do you have the same problem?
I can think of other issues - e.g. running out of CPU power - but I can't tell without more info about your system.
Thanks for the pointers. Just want to rule out any issue of my setup. I verified that there no dup IPs on the network.
It tries to resolve ARP of the server address 172.16.2.5 but somehow this ARP request is not seen on the bridge interface e101-003-0 that is connected to the server.
tcpdump on e101-003-0: root@leaf-2:/home/admin# tcpdump -nNei e101-003-0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on e101-003-0, link-type EN10MB (Ethernet), capture size 262144 bytes 23:26:42.695774 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 30, length 64 23:26:43.592091 52:54:00:22:4d:f5 > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 282: LLDP, length 268: leaf-2 23:26:43.703718 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 31, length 64 23:26:44.711817 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 32, length 64 23:26:45.713243 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 33, length 64 23:26:46.719978 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 34, length 64 23:26:47.727820 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 35, length 64 23:26:48.736608 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 36, length 64 23:26:49.743652 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 37, length 64 23:26:50.751874 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 38, length 64 23:26:51.759867 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 39, length 64 23:26:52.384835 52:54:00:4b:54:33 > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 217: LLDP, length 203: localhost 23:26:52.767959 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype IPv4 (0x0800), length 98: 172.16.2.5 > 172.16.2.1: ICMP echo request, id 5853, seq 40, length 64
root@leaf-2:/home/admin# brctl show br10 bridge name bridge id STP enabled interfaces br10 8000.525400224e56 no e101-003-0
In the KVM, OPX VM interfaces and server interfaces have corresponding vnet interfaces and they are connected to each other via bridge instance
Yes, this problem is seen every time.
ARP requests coming from server to OPX are seen correctly on both e101-003-0 and br10:
23:26:54.750065 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype ARP (0x0806), length 60: Request who-has 172.16.2.1 tell 172.16.2.5, length 46 23:26:54.750127 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 39, length 64 23:26:54.750163 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 40, length 64 23:26:54.750169 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 41, length 64 23:26:54.750204 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype ARP (0x0806), length 42: Reply 172.16.2.1 is-at 52:54:00:22:4e:56, length 28
23:26:54.750065 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype ARP (0x0806), length 60: Request who-has 172.16.2.1 tell 172.16.2.5, length 46 23:26:54.750115 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 39, length 64 23:26:54.750158 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 40, length 64 23:26:54.750166 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype IPv4 (0x0800), length 98: 172.16.2.1 > 172.16.2.5: ICMP echo reply, id 5853, seq 41, length 64 23:26:54.750200 52:54:00:22:4e:56 > 52:54:00:4b:54:33, ethertype ARP (0x0806), length 42: Reply 172.16.2.1 is-at 52:54:00:22:4e:56, length 28
@GarrickHe please advise if you've been able to repro on ESXi
Issue not seen on ESXi + OPX2.3.1. Will have to retest this on KVM and with OPX3.0.
Will retest after fix for below issue is finalized. https://github.com/open-switch/opx-nas-l3/issues/24
I have an OPX VM that has a bridge configured ‘br10’
br10 Link encap:Ethernet HWaddr 52:54:00:13:99:28 inet addr:172.16.2.1 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:fe13:9928/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:3570 errors:0 dropped:0 overruns:0 frame:0 TX packets:2631 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:297904 (290.9 KiB) TX bytes:177878 (173.7 KiB)
untagged ports in the bridge : e101-003-0
A server (Ubuntu VM) is connected to the bridge port e101-003-0 on OPX
server interface config: eth1 Link encap:Ethernet HWaddr 52:54:00:ca:ef:7c inet addr:172.16.2.5 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:feca:ef7c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3233 errors:0 dropped:0 overruns:0 frame:0 TX packets:8613 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:358302 (358.3 KB) TX bytes:871479 (871.4 KB)
When I run some ping tests from server to the bridge interface address (src: 172.16.2.5, dest: 172.16.2.1) there is packet loss (look at the missing sequence numbers below)
root@localhost:~# ping 172.16.2.1 PING 172.16.2.1 (172.16.2.1) 56(84) bytes of data. 64 bytes from 172.16.2.1: icmp_seq=1 ttl=64 time=2.13 ms 64 bytes from 172.16.2.1: icmp_seq=2 ttl=64 time=0.572 ms 64 bytes from 172.16.2.1: icmp_seq=3 ttl=64 time=0.921 ms 64 bytes from 172.16.2.1: icmp_seq=4 ttl=64 time=0.742 ms 64 bytes from 172.16.2.1: icmp_seq=5 ttl=64 time=0.680 ms 64 bytes from 172.16.2.1: icmp_seq=6 ttl=64 time=0.751 ms 64 bytes from 172.16.2.1: icmp_seq=7 ttl=64 time=0.803 ms 64 bytes from 172.16.2.1: icmp_seq=8 ttl=64 time=0.694 ms 64 bytes from 172.16.2.1: icmp_seq=9 ttl=64 time=0.770 ms 64 bytes from 172.16.2.1: icmp_seq=10 ttl=64 time=0.821 ms 64 bytes from 172.16.2.1: icmp_seq=48 ttl=64 time=2005 ms 64 bytes from 172.16.2.1: icmp_seq=47 ttl=64 time=3005 ms 64 bytes from 172.16.2.1: icmp_seq=49 ttl=64 time=1005 ms 64 bytes from 172.16.2.1: icmp_seq=50 ttl=64 time=5.34 ms 64 bytes from 172.16.2.1: icmp_seq=51 ttl=64 time=0.773 ms 64 bytes from 172.16.2.1: icmp_seq=52 ttl=64 time=0.888 ms 64 bytes from 172.16.2.1: icmp_seq=53 ttl=64 time=0.980 ms 64 bytes from 172.16.2.1: icmp_seq=54 ttl=64 time=0.698 ms 64 bytes from 172.16.2.1: icmp_seq=55 ttl=64 time=0.730 ms 64 bytes from 172.16.2.1: icmp_seq=56 ttl=64 time=0.902 ms 64 bytes from 172.16.2.1: icmp_seq=57 ttl=64 time=0.845 ms 64 bytes from 172.16.2.1: icmp_seq=58 ttl=64 time=0.795 ms 64 bytes from 172.16.2.1: icmp_seq=59 ttl=64 time=0.814 ms 64 bytes from 172.16.2.1: icmp_seq=99 ttl=64 time=8.95 ms 64 bytes from 172.16.2.1: icmp_seq=101 ttl=64 time=2.35 ms 64 bytes from 172.16.2.1: icmp_seq=102 ttl=64 time=0.860 ms 64 bytes from 172.16.2.1: icmp_seq=103 ttl=64 time=0.668 ms
I see the following syslogs continuously: — Jun 25 18:46:16 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port Jun 25 18:46:16 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19 Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_FDB:Switch Id: 0], FDB Entry not found for MAC: ff:ff:ff:ff:ff:ff vlan/bridge: 0xa Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_NEIGHBOR:Switch Id: 0], Failed to lookup port for Neighbor MAC from L2 FDB, Err: -16. Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ROUTE:HAL-RT-NDI], Failed to add : host: 172.16.2.5 mac_addr: 00:00:00:00:00:00, state: 3, port: 48 status:0x1 NPU status:0 unit:0 rif:0x600000000000a action: Drop Err -1845428240 Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19 Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19 Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_FDB:Switch Id: 0], FDB Entry not found for MAC: ff:ff:ff:ff:ff:ff vlan/bridge: 0xa Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_NEIGHBOR:Switch Id: 0], Failed to lookup port for Neighbor MAC from L2 FDB, Err: -16. Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ROUTE:HAL-RT-NDI], Failed to add : host: 172.16.2.5 mac_addr: 00:00:00:00:00:00, state: 3, port: 48 status:0x1 NPU status:0 unit:0 rif:0x600000000000a action: Drop Err -1845428240
—
On OPX side: ARP entry for the server interface does not seems to have been populated:
root@leaf-2:/var/log# ip neigh … 172.16.2.5 dev br10 INCOMPLETE …