openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.63k stars 930 forks source link

Semi-sync is not enforced if promotion rule is must_not #1369

Open binwiederhier opened 3 years ago

binwiederhier commented 3 years ago

(This is related to https://github.com/openark/orchestrator/issues/1360, and I'll describe more in that ticket)

We have MySQL hosts in our topology that we'd like to only ever use as a backup host, i.e. we drive backups of these hosts. On these hosts, we have the promotion rule set to must_not, meaning that we won't ever fail over to these hosts.

         ---> 2 (prefer, semi-sync replica)
       /
1 (source) 
       \
         ---> 3 (must_not, async replica)

In a failover scenario of the source (1), we'd like to enable semi-sync on that backup host (3) to ensure that we're not losing any transactions. At this time, Orchestrator does not support enforcing semi-sync for non-promotable hosts (note the sendACK):

    // If async fallback is disallowed, we'd better make sure to enable replicas to
    // send ACKs before START SLAVE. Replica ACKing is off at mysqld startup because
    // some replicas (those that must never be promoted) should never ACK.
    // Note: We assume that replicas use 'skip-slave-start' so they won't
    //       START SLAVE on their own upon restart.
    if instance.SemiSyncEnforced {
        // Send ACK only from promotable instances.
        sendACK := instance.PromotionRule != MustNotPromoteRule
        // Always disable master setting, in case we're converting a former master.
        if err := EnableSemiSync(instanceKey, false, sendACK); err != nil {
            return instance, log.Errore(err)
        }
    }

What is the history behind not allowing semi-sync for non-promotable hosts? If I were to implement a config option AllowSemiSyncForUnpromotableHosts that would effectively be used like this:

sendACK := config.AllowSemiSyncForUnpromotableHosts || instance.PromotionRule != MustNotPromoteRule
shlomi-noach commented 3 years ago

SemiSyncEnforced was contributed to orchestrator by the authors of Vitess (BTW I'm today a maintainer for Vitess but that's irrelevant). In vitess there are clear roles to the topology servers and orchestrator behavior was to match Vitess's configuration.

See more here: https://github.com/openark/orchestrator/issues/1360#issuecomment-862055660

If you wish to enforce semi-sync I suggest re-thinking and re-designing the behavior from scratch rather than patching existing behavior.