vmware-archive / vsphere-storage-for-docker

vSphere Storage for Docker
https://vmware.github.io/vsphere-storage-for-docker
Apache License 2.0
251 stars 95 forks source link

Implement the new locking and notification system for vFile #2001

Closed luomiao closed 6 years ago

luomiao commented 6 years ago

The PR is to solve issue https://github.com/vmware/docker-volume-vsphere/issues/1943

The new design has the following changes:

  1. The start and stop of file server is not triggered by global refcount change now. Instead, every global mount/umount request will increase the value of StartTrigger/StopTrigger in the KV store. Two separate watchers (on each master node) will generate events according to the PUT operations to these two triggers.
  2. The states of volumes are reduced to Ready and Mounted only. No intermediate states are needed. As a result, locks are required when the states need to be updated.
  3. To avoid overlapping operations, locks are also required for updating global refcounts. The locks for global refcount are different from the locks for volume states. Usually, the workers are the ones who grab global refcount locks, and the managers (watchers) are the ones who need state locks.
  4. Two fields StartMarker and StopMarker are used to guarantee only one manager's watcher is able to proceed to do the start/stop server operations. And thus other watchers will be able to return and handle events for different volumes in parallel.
  5. To make sure when an error happens, the volumes won't be left in an error state, the order for updating metadata in KV store has been changed too. First of all, global refcount will only be increased after the state of the volume is Mounted; Second, the state of volume can be changed to Mounted only after the file server is up and running. On the other side, during unmount, global refcount and client list will be updated first in the same transaction; then stop trigger is increased to trigger the event to stop the file server.
  6. During the volume deletion, both global refcount and state locks are required. The plugin is responsible to reset the global refcount to 0 and reset state to Ready, and then should increase the StopTrigger to shut down the file server service, if for some reason it's still running.