wp-cli / ideas

💡 Ideas and feature requests are collected here
40 stars 1 forks source link

Add support for database sharding #181

Open MatzeKitt opened 8 months ago

MatzeKitt commented 8 months ago

Feature Request

Describe your use case and the problem you are facing

We’re using database sharding along with HyperDB and face some problems when it comes to use WP-CLI. Especially, but not exclusively, it’s hard until impossible to use the wp db commands. E.g. running wp db tables results in the error mentioned in https://github.com/wp-cli/wp-cli/issues/4670, and also the workarounds there don’t work with sharding, since WP-CLI doesn’t know the shards here.

We’re running a pretty normal HyperDB configuration with the following db-config.php:

<?php
// Exit if accessed directly
defined( 'ABSPATH' ) || exit;

$wpdb->charset = 'utf8mb4';
$wpdb->collate = 'utf8mb4_unicode_ci';
$wpdb->save_queries = false;
$wpdb->persistent = false;
$wpdb->max_connections = 10;
$wpdb->check_tcp_responsiveness = true;

// Add global database
$wpdb->add_database( [
    'host' => DB_HOST,
    'user' => DB_USER,
    'password' => DB_PASSWORD,
    'name' => DB_NAME,
    'write' => 1,
    'read' => 2,
    'dataset' => 'global',
    'timeout' => .5,
] );

// Add 16 shards for blog databases
foreach ( array_merge( range( 0, 9 ), range( 'a', 'f' ) ) as $v ) {
    $wpdb->add_database( [
        'host' => DB_HOST,
        'user' => DB_USER,
        'password' => DB_PASSWORD,
        'name' => DB_NAME.'_'.$v,
        'write' => 1,
        'read' => 2,
        'dataset' => $v,
        'timeout' => .5,
    ] );
}

$wpdb->add_callback( 'get_blog_shard' );

function get_blog_shard( $query, $wpdb ) {
    if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
        return substr( md5( $wpdb->blogid ), 0, 1 );
    }
}

function wpdb_connection_error( $host, $port, $op, $tbl, $ds, $dbhname, $wpdb ) {
    dead_db();
}

$wpdb->add_callback( 'wpdb_connection_error', 'db_connection_error' );

Describe the solution you'd like

I would like to get support for the sharding callback added to $wpdb in order to determine the correct database tables just by specifying the --url parameter in a WP-CLI request.

danielbachhuber commented 8 months ago

Sounds like it would be a great improvement!

Feel free to submit a pull request, if you'd like. Here is some guidance on our pull request best practices.

MatzeKitt commented 8 months ago

I’ll try 😅

MatzeKitt commented 5 months ago

Before creating a pull request, I want to make sure whether my current approach is valid at all in my current working version. The current version is limited due to the fact that I only tested one command: vendor/bin/wp db search 'string' --network

First of all, in Utils\wp_get_table_names() I need to get the tables from information_schema.tables, since HyperDB doesn’t store any information about the tables. $wpdb->dbname is empty in this case.

        if ( empty( $wpdb->dbname ) && $wpdb instanceof \HyperDB ) {
            $mode = str_ends_with( $wpdb->dbhname, '__r' ) ? 'read' : 'write';
            $tables = [];

            foreach ( $wpdb->hyper_servers as $hyper_server ) {
                if ( empty( $hyper_server[ $mode ] ) ) {
                    continue;
                }

                foreach ( $hyper_server[ $mode ] as $server ) {
                    foreach ( $server as $server_data ) {
                        $databases[] = $server_data['name'];
                    }
                }
            }

            $table_schemas = $wpdb->get_results( sprintf( "SELECT table_schema as db, table_name as table_name FROM information_schema.tables WHERE table_schema IN ('%s')", implode( "', '", $wpdb->_escape( $databases ) ) ) );

            foreach ( $table_schemas as $table_schema ) {
                $tables[] = $table_schema->db . '.' . $table_schema->table_name;
            }
        } else {
            // Note: BC change 1.5.0, tables are sorted (via TABLES view).
            // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared -- uses esc_sql_ident() and $wpdb->_escape().
            $tables = $wpdb->get_col( sprintf( "SHOW TABLES WHERE %s IN ('%s')", esc_sql_ident( 'Tables_in_' . $wpdb->dbname ), implode( "', '", $wpdb->_escape( $wp_tables ) ) ) );
        }

The else is the current implementation. For our HyperDB version with database shardings, the result is a list of table names with database prefix: db.table, e.g. wordpress.wp_options if the database is called wordpress.

Now, since the table name is escaped using DB_Command::esc_sql_ident() in DB_Command::search() as well as DB_Command::get_columns(), I need to adjust that to properly escape the database and the table name:

        if ( str_contains( $table, '.' ) ) {
            list( $db, $table ) = explode( '.', $table );
            $table_sql          = self::esc_sql_ident( $db ) . '.' . self::esc_sql_ident( $table );
        } else {
            $table_sql = self::esc_sql_ident( $table );
        }

This way it works, but it feels a little bit hacky to me. What do you think about it? We can also discuss it via Slack, my username there is KittMedia.

danielbachhuber commented 4 months ago

How many total tables do you have? And what's the particular use case you're trying to solve for?

A couple of thoughts:

MatzeKitt commented 4 months ago

We have currently around 1.500 sites in a multisite in 16 different shards, so roughly 16.500 tables. The problem I try to solve is that any global db operation fails since WP-CLI is not aware of all the tables.

If you can suggest a holistic solution, I would love to hear it so that I can try to implement it. 🙂