laravel / ideas

Issues board used for Laravel internals discussions.
939 stars 28 forks source link

Database Connection Failover #1037

Open simplenotezy opened 6 years ago

simplenotezy commented 6 years ago

It should be possible to specify a database connection as suggested back in 2013 here: https://github.com/laravel/framework/issues/2808

For instance, database providers such as compose.io provide a secondary host, that will be used, if the primary fails.

imzo15 commented 6 years ago

In my case we have setup a Group Replication in Mysql, everything works great if we specify the master (write) and the slaves (read) nodes, but if the master fails a slave becomes the master, and there is no way right now to specify that in the config file.

sisve commented 6 years ago

Wouldn't it be easier to connect to a single endpoint and have your load-balancer/availability-checker switch where that endpoints goes to when the primary database fail?

imzo15 commented 6 years ago

Is there an easy way to know which node is able to read-only, even after master-slave switched roles after master went down using haproxy? This is the exception I am getting after master went down and one of the slaves became master:

[Illuminate\Database\QueryException]
SQLSTATE[HY000]: General error: 1290 The MySQL server is running with the --super-read-only option so it cannot execute this statement

I went through the code in vendor/laravel/framework/src/Illuminate/Database/Connectors/ConnectionFactory.php I thought it could be easy to hit the first node in writes array, if I get the Exception mentioned before, hit the other ones, that would give us some time to manually fix the problem and bring the primary master back up.

'write' => [ [ 'host' => 'xxx.xxx.xxx.xxx' ], [ 'host' => 'xxx.xxx.xxx.xxx' ], [ 'host' => 'xxx.xxx.xxx.xxx' ] ],

sisve commented 6 years ago

Laravel would need to have two connections configured, one for writing and one for reading. They both point towards a haproxy instance.

When your system promotes a slave (among other things issue the STOP SLAVE command), have your system communicate with haproxy to remove this promoted server from the pool of read servers, and have it as the only server in the pool of write servers (you remove the existing write-server that has failed you that you want to get rid of).

Scenario: A (master), B (slave), C (slave), D (slave). Haproxy forwards write connections to A, and read connections to B and C. When the master fails ...

  1. Decide on a new master (in this scenario B).
  2. Empty the server pool for write connections. We do not want anyone being able to write to the database until we're ready later on.
  3. Promote B to a new master.
  4. Configure C and D to use B as new master.
  5. Configure haproxy to use B for write connections.
  6. Configure haproxy to use C and D for read connections.
  7. (optional) Figure out why A died and fix it.

To answer your question; you already know which server was promoted to the new master since you told it to with the STOP SLAVE (plus additional stuff). Just tell haproxy about it.

imzo15 commented 6 years ago

Thanks for your reply and time @sisve . Group Replication auto promotes slave to master, so in scenario A if master failes becomes scenario B automatically. Scenario A: A (master), B (slave), C (slave), D (slave). Scenario B: B (master), C (slave), D (slave). I could let server A die, leave it there for a whole day, I dont care. I just want a simple for loop to loop through the connections handling the Exception until we bring server A back up. I want to know where to put that FOR LOOP, instead of using a load-balancer, frankly I think it is much simplier. It would be something like this

foreach ($write_hosts as $host) {

            try {
               //In a healthy environment, the first connection in the writes array 
              //is up and running, and should be returned right away, 
              //if not it will continue through the array
                return $host->connection();
            } catch (QueryException $e) {
               //The MySQL server is running with the --super-read-only

            }
         //End was reached without returning connection
           //If there is another kind of exception throw it
           throw $e
        }

I might as well just give up and try the load-balancer solution. Any idea how to point to a write / read server correctly? Currently I have a load-balancer that only checks health on port 80

imzo15 commented 6 years ago

Slave failover was added here: https://github.com/laravel/framework/pull/15553 Can someone point me out how to do the same for master?

sisve commented 6 years ago

I cannot help you any further since I believe the path you're onto (the described loop) is a bad solution. You would at least need to issue sql statements to find the current primary member as shown at https://dev.mysql.com/doc/refman/5.7/en/group-replication-find-primary.html.

It's also bad to let the application handle failover logic instead of a dedicated failover monitor. You should not use a new master until it is up-to-date with the replication log.

It is a good practice to wait for the new primary to apply its replication related relay-log before re-routing the client applications to it.

Source: https://dev.mysql.com/doc/refman/5.7/en/group-replication-single-primary-mode.html

All of this logic would be moot if you where running the replication group with in a multi-primary setup. Then you would just use a load-balancer that roundrobin-ed all connections to all servers.

imzo15 commented 6 years ago

Thanks for the advice, I will go for the load-balancer route. Yes I will have to implement a solution that knows who the master is, maybe a script executed by haproxy... Bringing the server that failed doesnt mean to bring it as a master, just means bringing up as a slave an let it update itself from the logs. (I hope it is easy as it sounds)

Yeah I agree with you on the last point, if I keep struggling with this, we might just go that route.

Again, thank you very much!

imzo15 commented 6 years ago

Update: I managed to handle the exception in the Exceptions Handler class. Now I am able to catch the exception and just tell laravel to use the next node to write to the databse. app/Exceptions/Handler.php if ($exception instanceof QueryException){
$total_hosts = 3; $message = $exception->getMessage(); $error_read_only = Str::contains($message, [ 'The MySQL server is running with the --super-read-only option', ]);

        if($error_read_only){
            $host_error_read_only = env('DB_WRITEHOST', '127.0.0.1');
            $tries = 0;
            $new_host = $host_error_read_only;
            for ($i=1; $i <= $total_hosts; $i++) {
                if(env('DB_HOST'.$i, '127.0.0.1') == $host_error_read_only){
                    $next = $i + 1;
                    if($next != $total_hosts){ 
                            $this->setEnvironmentValue('DB_WRITEHOST', env('DB_HOST'.$next, '127.0.0.1'));
                    }else{
                            $this->setEnvironmentValue('DB_WRITEHOST', env('DB_HOST1', '127.0.0.1') );
                    }
                }
            }

            $errors = ['csrf' => 'An error ocurred, please try again.'];
            return redirect($request->url())->withErrors($errors);
        }

}