sodafoundation / dock

SODA Terra Project DOCK module : is an open source implementation for the unified interface to connect heterogeneous storage backends.
Apache License 2.0
27 stars 17 forks source link

Revamp Host Based Replication using new framework/API from LinBit. #65

Open rajat-soda opened 3 years ago

rajat-soda commented 3 years ago

Issue/Feature Description: Current implementation of the Host Based replication does not work as found in the various trials as below.

The following approaches have been tried for enabling replication.

  1. Devsds Single node setup : As both the Primary and Secondary(replica) devices are on the same node, due to single node setup, Replication fails with error [EnableReplication-fm' failed: rpc error: code = Unknown desc = exit status 1]
  2. Ansible Multi-node setup : Dock was getting installed on both Primary & Secondary nodes, however secondary Dock was not receiving OSDSCTL commands.
  3. Devsds multi-node setup : Even though in some instances, the Volume creation and attach was successful on Primary & Secondary hosts. The Enable Replication failed with [EnableReplication-fm' failed: rpc error: code = Unknown desc = exit status 1]. Another issue observed is that, all the osdsctl commands are going though the Primary Controller-Dock and not to the Secondary Controller-Dock. However, the code [dock\pkg\model\replication.go] is written with the assumption that its executed simultaneously on both Primary and Secondary Nodes. As the simultaneous execution of the above code block was not happening the DRBD configuration / metadata was not getting created on both the Nodes, which is an expectation for DRBD to work successfully. Hence, the current implementation is not suitable for Replication feature and other alternatives should be explored.

    Enhancements would need be made to

    • Make Pool creation consistent and reliable.
      • Execution of Dock commands in a purely distributed manner.
      • Scale tests to be done for Controller-Dock for at least 4-8 nodes or more.

    In all the above approaches the following steps were missing

    1. /etc/opensds/driver/drbd.yaml was missing.
    2. Dock entry for drbd was missing in ETCD.
    3. Enable drivers "lvm,drbd" in /etc/opensds/opensds.conf Other issues observed :
      • Pool created on the primary/secondary host was not getting discovered.
      • Pool which were discovered earlier, were going into Unavailable state.
      • Due to this the volumes created earlier were getting invalidated.

Why this issue to fixed / feature is needed(give scenarios or use cases): To Fix HBReplication.

How to reproduce, in case of a bug:

Other Notes / Environment Information: (Please give the env information, log link or any useful information for this issue)