openbmc / phosphor-networkd

Apache License 2.0
18 stars 49 forks source link

Regression: IPMI managment of IPv4 static addresses have numerous behavioral issues #60

Open Howitzer105mm opened 1 year ago

Howitzer105mm commented 1 year ago

Problem statement

Intel QA reports their tests for managing static IPv4 addresses are failing. Specificially the long established mechanism for clearing a static IPv4 address is no longer operational.

The test consists of the following steps:

The SUT begins with DHCP v4 and v6 enabled
DHCPv4 is disabled
Any existing static IPv4 address is explicitly cleared
DHCPv4 is re-enabled

Duplicate the issue

The defect entry describes a short list of IPMI commands to cause the faulty behavior. The investigation into the issue described in the case has subsequently increased the number of commands issued and tested.

Remove the network config file

The time-zero state of a BMC network interface is for the /etc/systemd/network/00-bmc-ethx.network file to be implicitly held by the systemd-networkd system. For a BMC that has been in use for some time that state can be replicated by deleting that configuration file.

# rm /etc/systemd/network/00-bmc-eth0.network
# reboot

After the BMC reboots the BMC NIC will be configured by systemd-networkd default state, per recent phosphor-networkd source code changes.

Get LAN address source

Determine the current state of the IPv4 stack for the BMC NIC.

# ipmitool raw 0xc 2 3 4 0 0
11 02

The BMC NIC is assigned an IP address from a DHCP server.

Inspect the current IPv4 state

Find out what IPv4 address has been assigned to the BMC NIC from the local DHCPv4 server.

# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 534sec preferred_lft 534sec

Disable DHCPv4

Use IPMI to turn off the DHCPv4 address assignment.

# ipmitool raw 0xc 1 1 4 1

The default state for phosphor-networkd is to use a defacto systemd-networkd configuration. The default state is defined in the systemd-networkd service. Moving from DHCP to static IPv4 addressing writes a networkd.conf file into /etc/systemd/network/00-bmc-ethx.network. The configuration of the NIC can be inspected explicitly now.

Confirm the IPMI IPv4 address source

Examine the state of the BMC NIC after turning off the DHCPv4 functionality.

# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft foreve
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC NIC only has a self assigned IPv4 address. As can be seen in the eth0.network contents:

DHCP=ipv6
IPv6AcceptRA=true

Only IPV6 dynamic assignment is active.

Clear any static IPv4 address

IPMI uses the 0.0.0.0 address assignment to remove any active IPv4 static address assignment. Performing this action should be effectively a No Operation in the current state of the NIC.

# ipmitool raw 0xc 1 1 3 0 0 0 0
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.242.17/32 scope global eth0
       valid_lft forever preferred_lft forever
root@obmcjgmacm0:~# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

As can be seen, the operation was not a No Operation. The network configuration file received a new entry (Address=0.0.0.0/32). As can also be seen, the BMC NIC has acquired a random IP address.

Re-enable DHCPv4

Now restore DHCPv4 address assignment.

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0 
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.242.17/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.190.241/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 578sec preferred_lft 578sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=192.168.242.17/32
Address=0.0.0.0/32
DHCP=true
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This is a very strange combination of state.

  1. The DHCP v4 address (192.168.30.130) should be the only IPv4 address assigned.
  2. The static address assignment 0.0.0.0 is still present.
  3. The IPMI Get Lan IP Address shows there are several randomly assigned addresses.

This is not desirable behavior.

Issue is confirmed

The issue described by the Intel QA team is confirmed. Additional testing of IPv4 address assignment shows additional undesirable address handling artifacts.

Perform an extended sequence

Given the odd behavior above, try a more involved example.

Restart from a clean state

Restore the BMC to a "pristine" state.

# rm /etc/systemd/network/00-bmc-eth0.network
# reboot

Get LAN address source

Determine the current state of the IPv4 stack for the BMC NIC.

# ipmitool raw 0xc 2 3 4 0 0
11 02

The BMC NIC is assigned an IP address from a DHCP server.

Set LAN address source to static

# ipmitool raw 0xc 1 3 4 1

Get LAN address source

Confirm the BMC NIC is only accepting statically assigned IPv4 addresses.

# ipmitool raw 0xc 2 3 4 0 0 
11 01

The BMC NIC is only enabled to accept static addresses.

Collect the network configuration file

# cat /etc/systemd/network/00-bmc-eth0.network
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Gateway=<IPv4Address>
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

Assign a static IP address

Assign the static IPv4 address 192.168.20.123 and see what results of the assignment.

# ipmitool raw 0xc 1 1 3 192 168 20 123
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 14 7b
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.20.123/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=192.168.20.123/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This all looks correct. DHCPv4 has been disabled, and a static IPv4 address assigned.

Remove the assigned address

Removing the IP address via IPMI is done by assigning the 0.0.0.0 address to the NIC. This is the long standing IPMI method for clearing an assigned IPv4 static address.

# ipmitool raw 0xc 1 1 3 0 0 0 0
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.172.115/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The expectation is that there would no longer be a static IPv4 address present. Instead there is a 0.0.0.0 address in the network file. Making matters worse there is a randomly assigned address.

Reboot the BMC

Prior to re-enabling DHCPv4 see what happens when the BMC is reset.

# ipmitool raw 6 2
# ### Wait for BMC to reboot to login prompt
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 87 27
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.135.39/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.24.225/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

A BMC reboot has caused there to be two randomly assigned addresses. The addresses are not the same as the one prior to the reboot.

NOTE: I have witnessed the list of static addresses increase. One being added for each BMC reboot. This artifact did not present in this sequence. It may be due to testing this on a different SUT generation. A newer generation of SUT has shown the 1:1 relationship of reboot to the addition of another random IPv4 address assignment.

AC cycle the system under test

There's unexpected behavior when the BMC reboots. What occurs when the whole system is power cycled?

# ipmitool raw 0xc 2 1 3 0 0 
 11 c0 a8 5e 0b
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.102.167/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.94.11/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC is still getting a random IPv4 address assigned.

Now re-enable DHCP

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 00 83
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.0.131/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 568sec preferred_lft 568sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This is a very strange combination of state.

  1. The DHCP v4 address (192.168.30.130) should not be assigned. The network configuration file shows only ipv6 is active (DHCP=ipv6).
  2. The static address assignment 0.0.0.0 is still present.
  3. The IPMI Get Lan IP Address shows the randomly assigned address.

This is not desirable behavior.

Disable DHCPv4 again

# ipmitool raw 0xc 1 1 4 1
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.102.167/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The state of the network configuration file is better.

  1. The IPMI Get Lan IP Address command returns 0.0.0.0
  2. The DHCP value is correct (DHCP=ipv6)
  3. There isn't some randomly assigned address

Enable DHCPv4 again

Having eliminated the Address=0.0.0.0 entry in the network configuration file, what is the state of the SUT when DHCPv4 is re-enabled?

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0 
 11 c0 a8 1e 82
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 578sec preferred_lft 578sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=true
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC NIC is back to a clean state. The only difference now is the /etc/systemd/network/00-bmc-eth0.network file is explicit instead of implicit.

Conclusion

The recent changes to phosphor-network are a regression from prior behavior. Several undesirable artifacts occur when using IPMI to configure IPv4 static addresses.

The path to restoring a known good state is convoluted. A BMC user who has assigned a static IPv4 address, and then decides they no longer want it active is not going to find the DHCPv4 enable->disale->enable sequence desirable.

sunharis commented 1 year ago

@Howitzer105mm Please check if the issues are addressed with the below commits.

https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61997 https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/62033 https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61791

Howitzer105mm commented 1 year ago

I am investigating these changes. There appears to be a new artifact with p-n not related to these changes. The SUT I am using has a NCSI NIC, and it is not being enumerated. Trying to ID what is causing that issue.

Howitzer105mm commented 1 year ago

I am continuing to investigate this. The NSCI NIC issue was not related to these changes.

I have seen one piece of undesirable behavior related to the three commits. I am still characterizing.