Open Howitzer105mm opened 1 year ago
@Howitzer105mm Please check if the issues are addressed with the below commits.
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61997 https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/62033 https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61791
I am investigating these changes. There appears to be a new artifact with p-n not related to these changes. The SUT I am using has a NCSI NIC, and it is not being enumerated. Trying to ID what is causing that issue.
I am continuing to investigate this. The NSCI NIC issue was not related to these changes.
I have seen one piece of undesirable behavior related to the three commits. I am still characterizing.
Problem statement
Intel QA reports their tests for managing static IPv4 addresses are failing. Specificially the long established mechanism for clearing a static IPv4 address is no longer operational.
The test consists of the following steps:
Duplicate the issue
The defect entry describes a short list of IPMI commands to cause the faulty behavior. The investigation into the issue described in the case has subsequently increased the number of commands issued and tested.
Remove the network config file
The time-zero state of a BMC network interface is for the
/etc/systemd/network/00-bmc-ethx.network
file to be implicitly held by the systemd-networkd system. For a BMC that has been in use for some time that state can be replicated by deleting that configuration file.After the BMC reboots the BMC NIC will be configured by systemd-networkd default state, per recent phosphor-networkd source code changes.
Get LAN address source
Determine the current state of the IPv4 stack for the BMC NIC.
The BMC NIC is assigned an IP address from a DHCP server.
Inspect the current IPv4 state
Find out what IPv4 address has been assigned to the BMC NIC from the local DHCPv4 server.
Disable DHCPv4
Use IPMI to turn off the DHCPv4 address assignment.
The default state for phosphor-networkd is to use a defacto systemd-networkd configuration. The default state is defined in the systemd-networkd service. Moving from DHCP to static IPv4 addressing writes a networkd.conf file into
/etc/systemd/network/00-bmc-ethx.network
. The configuration of the NIC can be inspected explicitly now.Confirm the IPMI IPv4 address source
Examine the state of the BMC NIC after turning off the DHCPv4 functionality.
The BMC NIC only has a self assigned IPv4 address. As can be seen in the
eth0.network
contents:Only IPV6 dynamic assignment is active.
Clear any static IPv4 address
IPMI uses the
0.0.0.0
address assignment to remove any active IPv4 static address assignment. Performing this action should be effectively a No Operation in the current state of the NIC.As can be seen, the operation was not a No Operation. The network configuration file received a new entry (
Address=0.0.0.0/32
). As can also be seen, the BMC NIC has acquired a random IP address.Re-enable DHCPv4
Now restore DHCPv4 address assignment.
This is a very strange combination of state.
192.168.30.130
) should be the only IPv4 address assigned.0.0.0.0
is still present.This is not desirable behavior.
Issue is confirmed
The issue described by the Intel QA team is confirmed. Additional testing of IPv4 address assignment shows additional undesirable address handling artifacts.
Perform an extended sequence
Given the odd behavior above, try a more involved example.
Restart from a clean state
Restore the BMC to a "pristine" state.
Get LAN address source
Determine the current state of the IPv4 stack for the BMC NIC.
The BMC NIC is assigned an IP address from a DHCP server.
Set LAN address source to static
Get LAN address source
Confirm the BMC NIC is only accepting statically assigned IPv4 addresses.
The BMC NIC is only enabled to accept static addresses.
Collect the network configuration file
Assign a static IP address
Assign the static IPv4 address
192.168.20.123
and see what results of the assignment.This all looks correct. DHCPv4 has been disabled, and a static IPv4 address assigned.
Remove the assigned address
Removing the IP address via IPMI is done by assigning the
0.0.0.0
address to the NIC. This is the long standing IPMI method for clearing an assigned IPv4 static address.The expectation is that there would no longer be a static IPv4 address present. Instead there is a
0.0.0.0
address in the network file. Making matters worse there is a randomly assigned address.Reboot the BMC
Prior to re-enabling DHCPv4 see what happens when the BMC is reset.
A BMC reboot has caused there to be two randomly assigned addresses. The addresses are not the same as the one prior to the reboot.
NOTE: I have witnessed the list of static addresses increase. One being added for each BMC reboot. This artifact did not present in this sequence. It may be due to testing this on a different SUT generation. A newer generation of SUT has shown the 1:1 relationship of reboot to the addition of another random IPv4 address assignment.
AC cycle the system under test
There's unexpected behavior when the BMC reboots. What occurs when the whole system is power cycled?
The BMC is still getting a random IPv4 address assigned.
Now re-enable DHCP
This is a very strange combination of state.
192.168.30.130
) should not be assigned. The network configuration file shows only ipv6 is active (DHCP=ipv6
).0.0.0.0
is still present.This is not desirable behavior.
Disable DHCPv4 again
The state of the network configuration file is better.
0.0.0.0
DHCP=ipv6
)Enable DHCPv4 again
Having eliminated the
Address=0.0.0.0
entry in the network configuration file, what is the state of the SUT when DHCPv4 is re-enabled?The BMC NIC is back to a clean state. The only difference now is the
/etc/systemd/network/00-bmc-eth0.network
file is explicit instead of implicit.Conclusion
The recent changes to phosphor-network are a regression from prior behavior. Several undesirable artifacts occur when using IPMI to configure IPv4 static addresses.
The path to restoring a known good state is convoluted. A BMC user who has assigned a static IPv4 address, and then decides they no longer want it active is not going to find the DHCPv4 enable->disale->enable sequence desirable.