sandia-minimega / minimega

minimega
GNU General Public License v3.0
148 stars 66 forks source link

Use channel instead of mutex to serialize cli and meshage commands #1474

Closed activeshadow closed 2 years ago

activeshadow commented 2 years ago

There are cases where one mesh node sends commands to another, and while waiting for the other node to respond it sends the original node a command as well and waits for it to respond, leading to a blocking race condition. Below is an example:

When adding a network interface to an existing VM using "vm net add", the node the "vm net add" command is executed on sends a command over the mesh to the node the VM is running on and waits for a response.

       vm net add
head --------------> compute

When the compute node adds the network interface to the VM, it checks to see if the VLAN alias for the interface exists. If it doesn't, it creates the Alias-to-ID mapping and publishes it out to all the nodes in the mesh and waits for a response.

        vlans add
head <-------------- compute

This is where the blocking race condition occurs. The head node cannot process the "vlan add" command from the compute node until the compute node responds to the "vm net add" command, but the compute node is waiting for the head node to respond to the "vlans add" command before it responds to the "vm net add" command.

      vlans add resp
head -------X-------> compute

      vm net add resp
head <------X--------- compute

The reason the head node cannot respond to the "vlan add" command is due to the cmdLock mutex (which isn't protecting data but is instead ensuring commands are processed in a serial fashion), needed to process the "vlan add" command, being held by the function that made the "vm net add" call.

The fix for this was to switch from using a mutex ensuring commands are run in serial to using a channel, because the channel can queue commands to prevent blocking.