vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.73k stars 2.1k forks source link

Vitess backup restore fails using vitess-operator on kubernetes #8684

Open LHBAhsanG opened 3 years ago

LHBAhsanG commented 3 years ago

Overview of the Issue

Unable to see backup restored when recreating a deleted kube cluster

Reproduction Steps

Steps to reproduce this issue, example:

  1. Deploy the following cluster yaml on eks:
# Version: 20200601
apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  name: test
spec:
  backup:
    engine: xtrabackup
    locations:
    - s3:
        bucket: test-vitess
        region: us-east-1
        keyPrefix: testcluster
  cells:
  - name: useast1
    gateway:
      authentication:
        static:
          secret:
            name: test-cluster-config
            key: users.json
      replicas: 2
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
        limits:
          memory: 256Mi
      secureTransport:
        tls:
          certSecret:
            name: test-cluster-config
            key: cert.pem
          keySecret:
            name: test-cluster-config
            key: key.pem
  vitessDashboard:
    cells:
    - zone1
    extraFlags:
      security_policy: read-only
      backup_storage_implementation: s3
    replicas: 1
    resources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  keyspaces:
  - name: commerce
    turndownPolicy: Immediate
    partitionings:
    - equal:
        parts: 1
        shardTemplate:
          databaseInitScriptSecret:
            name: test-cluster-config
            key: init_db.sql
          replication:
            enforceSemiSync: false
          tabletPools:
          - cell: useast1
            type: replica
            replicas: 2
            vttablet:
              extraFlags:
                db_charset: utf8mb4
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
            mysqld:
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
            dataVolumeClaimTemplate:
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 10Gi
  updateStrategy:
    type: Immediate
---
apiVersion: v1
kind: Secret
metadata:
  name: test-cluster-config
type: Opaque
stringData:
  init_db.sql: |
    # This file is executed immediately after mysql_install_db,
    # to initialize a fresh data directory.

    ###############################################################################
    # Equivalent of mysql_secure_installation
    ###############################################################################

    # Changes during the init db should not make it to the binlog.
    # They could potentially create errant transactions on replicas.
    SET sql_log_bin = 0;
    # Remove anonymous users.
    DELETE FROM mysql.user WHERE User = '';

    # Disable remote root access (only allow UNIX socket).
    DELETE FROM mysql.user WHERE User = 'root' AND Host != 'localhost';

    # Remove test database.
    DROP DATABASE IF EXISTS test;

    ###############################################################################
    # Vitess defaults
    ###############################################################################

    # Vitess-internal database.
    CREATE DATABASE IF NOT EXISTS _vt;
    # Note that definitions of local_metadata and shard_metadata should be the same
    # as in production which is defined in go/vt/mysqlctl/metadata_tables.go.
    CREATE TABLE IF NOT EXISTS _vt.local_metadata (
      name VARCHAR(255) NOT NULL,
      value VARCHAR(255) NOT NULL,
      db_name VARBINARY(255) NOT NULL,
      PRIMARY KEY (db_name, name)
      ) ENGINE=InnoDB;
    CREATE TABLE IF NOT EXISTS _vt.shard_metadata (
      name VARCHAR(255) NOT NULL,
      value MEDIUMBLOB NOT NULL,
      db_name VARBINARY(255) NOT NULL,
      PRIMARY KEY (db_name, name)
      ) ENGINE=InnoDB;

    # Admin user with all privileges.
    CREATE USER 'vt_dba'@'localhost';
    GRANT ALL ON *.* TO 'vt_dba'@'localhost';
    GRANT GRANT OPTION ON *.* TO 'vt_dba'@'localhost';

    # User for app traffic, with global read-write access.
    CREATE USER 'vt_app'@'localhost';
    GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, FILE,
      REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES,
      LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW,
      SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER
      ON *.* TO 'vt_app'@'localhost';

    # User for app debug traffic, with global read access.
    CREATE USER 'vt_appdebug'@'localhost';
    GRANT SELECT, SHOW DATABASES, PROCESS ON *.* TO 'vt_appdebug'@'localhost';

    # User for administrative operations that need to be executed as non-SUPER.
    # Same permissions as vt_app here.
    CREATE USER 'vt_allprivs'@'localhost';
    GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, FILE,
      REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES,
      LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW,
      SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER
      ON *.* TO 'vt_allprivs'@'localhost';

    # User for slave replication connections.
    # TODO: Should we set a password on this since it allows remote connections?
    CREATE USER 'vt_repl'@'%';
    GRANT REPLICATION SLAVE ON *.* TO 'vt_repl'@'%';

    # User for Vitess filtered replication (binlog player).
    # Same permissions as vt_app.
    CREATE USER 'vt_filtered'@'localhost';
    GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, FILE,
      REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES,
      LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW,
      SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER
      ON *.* TO 'vt_filtered'@'localhost';

    # User for Orchestrator (https://github.com/github/orchestrator).
    # TODO: Reenable when the password is randomly generated.
    #CREATE USER 'orc_client_user'@'%' IDENTIFIED BY 'orc_client_user_password';
    #GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD
    #  ON *.* TO 'orc_client_user'@'%';
    #GRANT SELECT
    #  ON _vt.* TO 'orc_client_user'@'%';

    FLUSH PRIVILEGES;

    RESET SLAVE ALL;
    RESET MASTER;
  1. Deploy the following schema: https://github.com/vitessio/vitess/blob/main/examples/operator/create_commerce_schema.sql

  2. Delete the existing cluster after checking backup files created in s3 bucket

  3. Redeploy the cluster

  4. Inspecting the logs show the following error:

I0825 20:18:47.743816       1 s3.go:261] ListBackups: [s3] dir: commerce/-, bucket: oxygen-vitess
I0825 20:18:50.978970       1 s3.go:273] objName: 0xc000dc0000
time="2021-08-25T20:18:51Z" level=info msg="Reconciling VitessBackupStorage" namespace=default subcontroller=VitessBackupStorage vitessbackupstorage=test-e2a3e68d
I0825 20:18:51.124050       1 s3.go:261] ListBackups: [s3] dir: commerce/-, bucket: oxygen-vitess
I0825 20:18:54.306436       1 s3.go:273] objName: 0xc000a9c4a0
{"level":"error","ts":1629922734.3424866,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"vitessbackupstorage-subcontroller","request":"default/test-e2a3e68d","error":"Operation cannot be fulfilled on vitessbackupstorages.planetscale.com \"test-e2a3e68d\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:88"}
time="2021-08-25T20:18:55Z" level=info msg="Reconciling VitessBackupStorage" namespace=default subcontroller=VitessBackupStorage vitessbackupstorage=test-e2a3e68d
I0825 20:18:55.343225       1 s3.go:261] ListBackups: [s3] dir: commerce/-, bucket: oxygen-vitess
I0825 20:18:58.524614       1 s3.go:273] objName: 0xc000ab3bd0

Inspcted the db and i see no tables restored

LHBAhsanG commented 3 years ago

@askdba any help please?

frouioui commented 1 month ago

Hello @LHBAhsanG are you still experiencing this issue or otherwise found the cause?