ocf / puppet

Puppet config for OCF servers and lab machines
https://www.ocf.berkeley.edu/
32 stars 71 forks source link

configure 802.3ad LACP on hypervisors and switch #415

Open abizer opened 6 years ago

abizer commented 6 years ago

last week I reorganized all the 10GbE DAC cables for the hypervisors, such that each hypervisor has both ports on the SolarFlare NIC plugged into the Arista 7050 switch, in consecutive interfaces. e.g. riptide is plugged into Ethernet 1 and Ethernet 2 on the switch, which, physically, are the leftmost top and bottom ports on the switch. Each hypervisor is organized this way, taking a top and bottom port, and @dongkyunc has helpfully added labels to the underside of the switch indicating which server each group corresponds to.

Now, we need to activate that second interface on each hypervisor, and preferably put both interfaces into an LACP channel group. This means that while both links are up, the virtual LACP bond interface will have an aggregate bandwidth of 20Gb/s, but if one of the links fails, the interface will not drop entirely but instead merely drop to 10Gb/s. This gives us an element of fault tolerance while doubling the existing bandwidth each server can utilize. Configuring 802.3ad will require configuration on both the hosts and the switch. @gpl and I have experimented with configuring the switch to support LACP while @cg505 and I have experimented with configuring the hosts to support bond0 as the LACP interface. Some work still needs to happen to get everything working, but I was able to configure the bond interface on dev-fallingrocks into active-passive mode before accidentally locking myself out of the machine when trying to configure the switch into LACP mode.

We will need to modify the configuration in https://github.com/ocf/puppet/blob/master/modules/ocf/manifests/networking.pp to support bringing up the bond interface correctly and configuring it to bridge to br0 VMs as well. Doing it would likely make for an interesting blog post of sorts as well since much of the documentation online for doing this is rather out of date.

abizer commented 6 years ago

so this tentatively works on dev-fallingrocks in proper LACP mode. The two interfaces on the NIC are plugged into Ethernet 31 and 32 on the switch and have been configured as members of port-channel 16, and the quick-n-dirty test of "pull one of the cables out and make sure connections don't die" succeeds. Once another machine successfully comes up on the bond interface I'll try confirming the increase in bandwidth.

abizer commented 6 years ago

dev-fallingrocks, hal, and scurvy are now running in LACP mode.

To configure this on the switch, after logging in and indentifying the correct interfaces (say, Ethernet 31 and Ethernet 32), one does:

enable config interface Ethernet 31-32 channel-group 16 mode active interface port-channel 16

something along those lines

abizer commented 5 years ago

with #430 all the hypervisors are configured on the host-side for LACP, but aren't yet on the switch side. The LACP stuff won't take effect until networking on the hypervisors is restarted, which is a downtime-causing event that will need to be planned.

cg505 commented 4 years ago

@abizer this is done, right?