xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
366 stars 172 forks source link

the installnic does not bring up on 1st reboot after provision of ubuntu 16.4.1 on 8335 GTB #3077

Closed immarvin closed 6 years ago

immarvin commented 7 years ago

this is reported from customer, the root cause is that the predictable network device name changed between the debian installer(enP5p7s0f0) and the installed system(enP9p7s0f0):

in petitboot:
/ # ifconfig -a
enP5p7s0f0 Link encap:Ethernet  HWaddr 70:E2:84:14:18:B3
          inet addr:129.40.41.235  Bcast:129.40.41.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:318 errors:0 dropped:0 overruns:0 frame:0
          TX packets:149 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:44144 (43.1 KiB)  TX bytes:22252 (21.7 KiB)
          Interrupt:205

enP5p7s0f1 Link encap:Ethernet  HWaddr 70:E2:84:14:18:B4
          inet addr:129.40.42.117  Bcast:129.40.42.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:143 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21671 (21.1 KiB)  TX bytes:3126 (3.0 KiB)
          Interrupt:206

enp1s0f0  Link encap:Ethernet  HWaddr 98:BE:94:68:9E:58
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:246 Memory:200000000000-2000007fffff

enp1s0f1  Link encap:Ethernet  HWaddr 98:BE:94:68:9E:59
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:247 Memory:200001000000-2000017fffff

enp1s0f2  Link encap:Ethernet  HWaddr 98:BE:94:68:9E:5A
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:247 Memory:200002000000-2000027fffff

enp1s0f3  Link encap:Ethernet  HWaddr 98:BE:94:68:9E:5B
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:248 Memory:200003000000-2000037fffff

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tunl0     Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # lspci
0000:00:00.0 PCI bridge: IBM Device 03dc
0000:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.2 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0001:00:00.0 PCI bridge: IBM Device 03dc
0001:01:00.0 Non-Volatile memory controller: HGST, Inc. Ultrastar SN100 Series NVMe SSD (rev 05)
0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0004:00:00.0 PCI bridge: IBM Device 03dc
0004:01:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0004:01:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0005:00:00.0 PCI bridge: IBM Device 03dc
0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0006:00:00.0 PCI bridge: IBM Device 03dc
0006:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0007:00:00.0 PCI bridge: IBM Device 03dc
0007:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)

inside delian-installer:
~ # ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enP5p7s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 70:e2:84:14:18:b3 brd ff:ff:ff:ff:ff:ff
3: enP5p7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq qlen 1000
    link/ether 70:e2:84:14:18:b4 brd ff:ff:ff:ff:ff:ff
    inet 129.40.42.117/24 brd 129.40.42.255 scope global enP5p7s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::72e2:84ff:fe14:18b4/64 scope link
       valid_lft forever preferred_lft forever
4: enp1s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 98:be:94:68:9e:58 brd ff:ff:ff:ff:ff:ff
5: enp1s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 98:be:94:68:9e:59 brd ff:ff:ff:ff:ff:ff
6: enp1s0f2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 98:be:94:68:9e:5a brd ff:ff:ff:ff:ff:ff
7: enp1s0f3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 98:be:94:68:9e:5b brd ff:ff:ff:ff:ff:ff

~ # cat /etc/*release
xenial
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
~ # lspci
0000:00:00.0 PCI bridge: IBM Device 03dc
0000:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.2 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0001:00:00.0 PCI bridge: IBM Device 03dc
0001:01:00.0 Non-Volatile memory controller: HGST, Inc. Ultrastar SN100 Series NVMe SSD (rev 05)
0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0004:00:00.0 PCI bridge: IBM Device 03dc
0004:01:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0004:01:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0005:00:00.0 PCI bridge: IBM Device 03dc
0005:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0005:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0005:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0005:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
0005:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
0005:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0006:00:00.0 PCI bridge: IBM Device 03dc
0006:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0007:00:00.0 PCI bridge: IBM Device 03dc
0007:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0008:00:00.0 Bridge: IBM Device 04ea
0008:00:00.1 Bridge: IBM Device 04ea
0008:00:01.0 Bridge: IBM Device 04ea
0008:00:01.1 Bridge: IBM Device 04ea
0009:00:00.0 Bridge: IBM Device 04ea
0009:00:00.1 Bridge: IBM Device 04ea
0009:00:01.0 Bridge: IBM Device 04ea
0009:00:01.1 Bridge: IBM Device 04ea

inside installed os:

root@xxxx:~# ifconfig -a
enP9p7s0f0 Link encap:Ethernet  HWaddr 70:e2:84:14:18:b3
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:205

enP9p7s0f1 Link encap:Ethernet  HWaddr 70:e2:84:14:18:b4
          inet addr:129.40.42.117  Bcast:129.40.42.255  Mask:255.255.255.0
          inet6 addr: fe80::72e2:84ff:fe14:18b4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5127 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:120928 (120.9 KB)  TX bytes:711107 (711.1 KB)
          Interrupt:206

enp1s0f0  Link encap:Ethernet  HWaddr 98:be:94:68:9e:58
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:246 Memory:200000000000-2000007fffff

enp1s0f1  Link encap:Ethernet  HWaddr 98:be:94:68:9e:59
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:247 Memory:200001000000-2000017fffff

enp1s0f2  Link encap:Ethernet  HWaddr 98:be:94:68:9e:5a
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:247 Memory:200002000000-2000027fffff

enp1s0f3  Link encap:Ethernet  HWaddr 98:be:94:68:9e:5b
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:248 Memory:200003000000-2000037fffff

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:246 errors:0 dropped:0 overruns:0 frame:0
          TX packets:246 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:19568 (19.5 KB)  TX bytes:19568 (19.5 KB)

root@xxx:~# lspci
0000:00:00.0 PCI bridge: IBM Device 03dc
0000:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.2 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0000:01:00.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
0001:00:00.0 PCI bridge: IBM Device 03dc
0001:01:00.0 Non-Volatile memory controller: HGST, Inc. Ultrastar SN100 Series NVMe SSD (rev 05)
0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
0004:00:00.0 Bridge: IBM Device 04ea
0004:00:00.1 Bridge: IBM Device 04ea
0004:00:01.0 Bridge: IBM Device 04ea
0004:00:01.1 Bridge: IBM Device 04ea
0005:00:00.0 Bridge: IBM Device 04ea
0005:00:00.1 Bridge: IBM Device 04ea
0005:00:01.0 Bridge: IBM Device 04ea
0005:00:01.1 Bridge: IBM Device 04ea
0008:00:00.0 PCI bridge: IBM Device 03dc
0008:01:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0008:01:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0009:00:00.0 PCI bridge: IBM Device 03dc
0009:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0009:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0009:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0009:02:03.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0009:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
0009:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0009:04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0009:05:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
0009:06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
0009:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0009:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
000a:00:00.0 PCI bridge: IBM Device 03dc
000a:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
000b:00:00.0 PCI bridge: IBM Device 03dc
000b:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)

opened a ticket to LTC to trace this https://bugzilla.linux.ibm.com/show_bug.cgi?id=154740

currently, the work around is to config enP9p7s0f0 with confignics.

immarvin commented 7 years ago

according to the ubuntu experts: " This is a known issue, and a side effect of using an old boot/install image. The install image used was using the old 4.4.0-31.50 kernel, but the handling of the PCI domain was changed in kernels 4.4.0-36+

https://wiki.ubuntu.com/ppc64el/Recommendations#Possible_network_interface_name_change_after_upgrading_kernel

To avoid this issue, be sure to use the latest install images from the updates repo. For example:

http://ports.ubuntu.com/ubuntu-ports/dists/xenial-updates/main/installer-ppc64el/20101020ubuntu451.10/images/ "

immarvin commented 7 years ago

the customer still need some time to finish the verification

robin2008 commented 7 years ago

As this is an upstream known issue, it is better to document it.

zet809 commented 7 years ago

Move to next sprint for document update.