Open abizer opened 6 years ago
so this tentatively works on dev-fallingrocks in proper LACP mode. The two interfaces on the NIC are plugged into Ethernet 31 and 32 on the switch and have been configured as members of port-channel 16, and the quick-n-dirty test of "pull one of the cables out and make sure connections don't die" succeeds. Once another machine successfully comes up on the bond interface I'll try confirming the increase in bandwidth.
dev-fallingrocks, hal, and scurvy are now running in LACP mode.
To configure this on the switch, after logging in and indentifying the correct interfaces (say, Ethernet 31
and Ethernet 32
), one does:
enable
config
interface Ethernet 31-32
channel-group 16 mode active
interface port-channel 16
something along those lines
with #430 all the hypervisors are configured on the host-side for LACP, but aren't yet on the switch side. The LACP stuff won't take effect until networking on the hypervisors is restarted, which is a downtime-causing event that will need to be planned.
@abizer this is done, right?
last week I reorganized all the 10GbE DAC cables for the hypervisors, such that each hypervisor has both ports on the SolarFlare NIC plugged into the Arista 7050 switch, in consecutive interfaces. e.g. riptide is plugged into
Ethernet 1
andEthernet 2
on the switch, which, physically, are the leftmost top and bottom ports on the switch. Each hypervisor is organized this way, taking a top and bottom port, and @dongkyunc has helpfully added labels to the underside of the switch indicating which server each group corresponds to.Now, we need to activate that second interface on each hypervisor, and preferably put both interfaces into an LACP channel group. This means that while both links are up, the virtual LACP bond interface will have an aggregate bandwidth of 20Gb/s, but if one of the links fails, the interface will not drop entirely but instead merely drop to 10Gb/s. This gives us an element of fault tolerance while doubling the existing bandwidth each server can utilize. Configuring 802.3ad will require configuration on both the hosts and the switch. @gpl and I have experimented with configuring the switch to support LACP while @cg505 and I have experimented with configuring the hosts to support
bond0
as the LACP interface. Some work still needs to happen to get everything working, but I was able to configure the bond interface on dev-fallingrocks into active-passive mode before accidentally locking myself out of the machine when trying to configure the switch into LACP mode.We will need to modify the configuration in https://github.com/ocf/puppet/blob/master/modules/ocf/manifests/networking.pp to support bringing up the bond interface correctly and configuring it to bridge to br0 VMs as well. Doing it would likely make for an interesting blog post of sorts as well since much of the documentation online for doing this is rather out of date.