oracle / mysql-operator

Create, operate and scale self-healing MySQL clusters in Kubernetes
870 stars 235 forks source link

Cluster failed to start after config PV #181

Open zqkou opened 6 years ago

zqkou commented 6 years ago

Versions

MySQL Operator Version: 0.2.0

Environment:

What happened?

Create cluster using mysql operator. Everything went smoothly until added PV to spec. I was following the example "Create a cluster with a persistent volume". But my cluster is using Gluster FS so I modified the example as below:

======================

apiVersion: mysql.oracle.com/v1alpha1
kind: Cluster
metadata:
  name: mysql-multiprimary-cluster
spec:
  multiMaster: true
  members: 3
  rootPasswordSecret:
    name: mysql-multiprimary-cluster-root-password
  config:
    name: mysql-multiprimary-cluster
  volumeClaimTemplate:
    metadata:
      name: data
    spec:
      storageClassName: scmsgpgf
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 1Gi

====================== My GlusterFS configured as:

> kubectl get storageclass scmsgpgf
NAME                 PROVISIONER               AGE
scmsgpgf (default)   kubernetes.io/glusterfs   5d

====================== PVC is created automatically:

NAME                                STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-mysql-multiprimary-cluster-0   Bound     pvc-41005526-83eb-11e8-a31a-005056b52819   1Gi        RWX            scmsgpgf       21m

====================== PV is created automatically:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                           STORAGECLASS               REASON    AGE
pvc-41005526-83eb-11e8-a31a-005056b52819   1Gi        RWX            Delete           Bound     data-mysql-multiprimary-cluster-0   scmsgpgf                             22m

====================== ERROR logs found from statefulset memeber-0:

2018-07-10T02:47:05.446916Z 16 [ERROR] [MY-011685] [Repl] Plugin group_replication reported: 'The group name option is mandatory'
2018-07-10T02:47:05.446955Z 16 [ERROR] [MY-011660] [Repl] Plugin group_replication reported: 'Unable to start Group Replication on boot'
2018-07-10T02:47:05.673598Z 16 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-07-10T02:47:05.795602Z 16 [Note] [MY-011716] [Repl] Plugin group_replication reported: 'Current debug options are: 'GCS_DEBUG_NONE'.'
2018-07-10T02:47:05.796742Z 16 [Note] [MY-011671] [Repl] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED"; server_key_file: "server-key.pem"; server_cert_file: "server-cert.pem"; client_key_file: "server-key.pem"; client_cert_file: "server-cert.pem"; ca_file: "ca.pem"; ca_path: ""; cipher: ""; tls_version: "TLSv1,TLSv1.1,TLSv1.2"; crl_file: ""; crl_path: ""; ssl_fips_mode: "OFF"'
2018-07-10T02:47:05.801846Z 16 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Debug messages will be sent to: asynchronous::/var/lib/mysql/GCS_DEBUG_TRACE'
2018-07-10T02:47:05.804235Z 16 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Translated 'mysql-multiprimary-cluster-0.mysql-multiprimary-cluster' to 10.1.126.253'
2018-07-10T02:47:05.804721Z 16 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2018-07-10T02:47:05.810137Z 16 [Note] [MY-011694] [Repl] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: '87e02550-83eb-11e8-b41a-423e6c34d4a2'; group_replication_local_address: 'mysql-multiprimary-cluster-0.mysql-multiprimary-cluster:33061'; group_replication_group_seeds: ''; group_replication_bootstrap_group: 'true'; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: '10.0.0.0/8'; group_replication_communication_debug_options: 'GCS_DEBUG_NONE''
2018-07-10T02:47:05.810272Z 16 [Note] [MY-011643] [Repl] Plugin group_replication reported: 'Member configuration: member_id: 1000; member_uuid: "6728a629-83eb-11e8-aef3-423e6c34d4a2"; single-primary mode: "false"; group_replication_auto_increment_increment: 7; '
2018-07-10T02:47:05.969207Z 18 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-07-10T02:47:06.198083Z 16 [Note] [MY-011670] [Repl] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2018-07-10T02:47:06.200663Z 21 [Note] [MY-010581] [Repl] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './mysql-multiprimary-cluster-0-relay-bin-group_replication_applier.000001' position: 4
2018-07-10T02:47:06.220705Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] XCom protocol version: 3'
2018-07-10T02:47:06.220813Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] XCom initialized and ready to accept incoming connections on port 33061'
2018-07-10T02:47:06.373336Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Creating new server node 0 host mysql-multiprimary-cluster-0.mysql-multiprimary-cluster'
2018-07-10T02:47:06.374033Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={8379f9e 0 0} boot_key={8379f9e 0 0} node 0'
2018-07-10T02:47:06.374665Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Re-using server node 0 host mysql-multiprimary-cluster-0.mysql-multiprimary-cluster'
2018-07-10T02:47:06.374693Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={8379f9e 1 0} boot_key={8379f9e 1 0} node 0'
2018-07-10T02:47:09.376705Z 0 [Note] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to mysql-multiprimary-cluster-0.mysql-multiprimary-cluster:3306 on view 15311908283749897:1.'
2018-07-10T02:47:09.376693Z 24 [Note] [MY-011623] [Repl] Plugin group_replication reported: 'Only one server alive. Declaring this server as online within the replication group'
2018-07-10T02:47:09.381500Z 21 [ERROR] [MY-010834] [Server] next log error: -1  offset: 148  log: ./mysql-multiprimary-cluster-0-relay-bin-group_replication_applier.000002 included: 0
2018-07-10T02:47:09.381585Z 21 [ERROR] [MY-010596] [Repl] Error reading relay log event for channel 'group_replication_applier': Error purging processed logs
2018-07-10T02:47:09.381642Z 21 [ERROR] [MY-013121] [Repl] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: MY-013121
2018-07-10T02:47:09.381663Z 21 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0
2018-07-10T02:47:09.381692Z 21 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2018-07-10T02:47:09.381835Z 18 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2018-07-10T02:47:09.381874Z 24 [ERROR] [MY-011625] [Repl] Plugin group_replication reported: 'Unable to ensure the execution of group transactions received during recovery.'
2018-07-10T02:47:09.381912Z 24 [ERROR] [MY-011620] [Repl] Plugin group_replication reported: 'Fatal error during the recovery process of Group Replication. The server will leave the group.'
2018-07-10T02:47:09.381913Z 18 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2018-07-10T02:47:09.381937Z 24 [Warning] [MY-011645] [Repl] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.'
2018-07-10T02:47:09.382446Z 18 [Note] [MY-011444] [Repl] Plugin group_replication reported: 'The group replication applier thread was killed.'
2018-07-10T02:47:09.382601Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={8379f9e 25 0} boot_key={8379f9e 14 0} node 4294967295'
2018-07-10T02:47:21.374591Z 0 [Note] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

What you expected to happen?

Suppose mysql group replication should be created successfully. As the same result as no PV is provisioned.

How to reproduce it (as minimally and precisely as possible)?

If you have GlusterFS configured as storage class, just create cluster using the same yaml I gave at the beginning.

Anything else we need to know?

Please let me know if anything else is need. Currently I am confused with the results and the error logs. No idea what should be checked for the next step. Thank you!

zqkou commented 6 years ago

More information got:

I compared the group replication related global variables with or without PV configured, there are some different:

With PV: group_replication_bootstrap_group: ON group_replication_group_seeds:

Without PV: group_replication_bootstrap_group: OFF group_replication_group_seeds: mysql-multiprimary-cluster-1.mysql-multiprimary-cluster:33061,mysql-multiprimary-cluster-2.mysql-multiprimary-cluster:33061

Hope that helps.

prydie commented 6 years ago

Hi @zqkou,

I believe the key log line is:

2018-07-10T02:47:09.381642Z 21 [ERROR] [MY-013121] [Repl] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: MY-013121

I notice you are running in multi-primary mode. There are known limitations with this mode of operation. Would you mind trying in single primary mode and seeing if that works for you?

zqkou commented 6 years ago

Hi @prydie,

Thank you! I have tried single primary with almost the same error result.

This is the yaml I used:

apiVersion: mysql.oracle.com/v1alpha1 kind: Cluster metadata: name: mysql-sp-cluster namespace: default spec: members: 3 volumeClaimTemplate: metadata: name: data spec: accessModes:

Error Log as below:

2018-07-12T07:31:12.307022Z 0 [Note] [MY-010163] [Server] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=mysql-sp-cluster-0-bin' to avoid this problem. 2018-07-12T07:31:12.288950Z 0 [Note] [MY-010949] [Server] Basedir set to /usr/. 2018-07-12T07:31:12.289041Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.11) starting as process 1 2018-07-12T07:31:12.319355Z 0 [Note] [MY-012366] [InnoDB] InnoDB: Using Linux native AIO 2018-07-12T07:31:12.319575Z 0 [Note] [MY-010747] [Server] Plugin 'FEDERATED' is disabled. 2018-07-12T07:31:12.325517Z 1 [Note] [MY-011825] [InnoDB] InnoDB: Number of pools: 1 2018-07-12T07:31:12.324992Z 1 [Note] [MY-012943] [InnoDB] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-07-12T07:31:12.325059Z 1 [Note] [MY-012948] [InnoDB] InnoDB: Compressed tables use zlib 1.2.11 2018-07-12T07:31:12.325739Z 1 [Note] [MY-012951] [InnoDB] InnoDB: Using CPU crc32 instructions 2018-07-12T07:31:12.325025Z 1 [Note] [MY-012945] [InnoDB] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2018-07-12T07:31:12.325011Z 1 [Note] [MY-012944] [InnoDB] InnoDB: Uses event mutexes 2018-07-12T07:31:12.324948Z 1 [Note] [MY-012932] [InnoDB] InnoDB: PUNCH HOLE support available 2018-07-12T07:31:12.326511Z 1 [Note] [MY-012203] [InnoDB] InnoDB: Directories to scan './;./;./' 2018-07-12T07:31:12.326624Z 1 [Note] [MY-012204] [InnoDB] InnoDB: Scanning './' 2018-07-12T07:31:12.348652Z 1 [Note] [MY-012208] [InnoDB] InnoDB: Completed space ID check of 4 files. 2018-07-12T07:31:12.349878Z 1 [Note] [MY-012955] [InnoDB] InnoDB: Initializing buffer pool, total size = 128.000000M, instances = 1, chunk size =128.000000M 2018-07-12T07:31:12.364354Z 1 [Note] [MY-012957] [InnoDB] InnoDB: Completed initialization of buffer pool 2018-07-12T07:31:12.367594Z 0 [Note] [MY-011952] [InnoDB] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-07-12T07:31:12.476698Z 1 [Note] [MY-013086] [InnoDB] InnoDB: Starting to parse redo log at lsn = 29590560, whereas checkpoint_lsn = 29590963 2018-07-12T07:31:12.483488Z 1 [Note] [MY-013083] [InnoDB] InnoDB: Log background threads are being started... 2018-07-12T07:31:12.484103Z 1 [Note] [MY-012535] [InnoDB] InnoDB: Apply batch completed! 2018-07-12T07:31:12.484045Z 1 [Note] [MY-012532] [InnoDB] InnoDB: Applying a batch of 0 redo log records ... 2018-07-12T07:31:13.016372Z 1 [Note] [MY-012910] [InnoDB] InnoDB: Opened 2 existing undo tablespaces. 2018-07-12T07:31:13.063075Z 1 [Note] [MY-012923] [InnoDB] InnoDB: Creating shared tablespace for temporary tables 2018-07-12T07:31:13.070284Z 1 [Note] [MY-012265] [InnoDB] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-07-12T07:31:13.329427Z 1 [Note] [MY-012266] [InnoDB] InnoDB: File './ibtmp1' size is now 12 MB. 2018-07-12T07:31:13.335482Z 1 [Note] [MY-013018] [InnoDB] InnoDB: Created 128 and tracked 128 new rollback segment(s) in the temporary tablespace. 128 are now active. 2018-07-12T07:31:13.335800Z 1 [Note] [MY-012976] [InnoDB] InnoDB: 8.0.11 started; log sequence number 29590963 2018-07-12T07:31:13.355361Z 1 [Note] [MY-011089] [Server] Data dictionary restarting version '80011'. 2018-07-12T07:31:13.536595Z 1 [Note] [MY-012357] [InnoDB] InnoDB: Reading DD tablespace files 2018-07-12T07:31:13.540519Z 1 [Note] [MY-012356] [InnoDB] InnoDB: Validated 6/6 tablespaces 2018-07-12T07:31:13.542422Z 0 [Note] [MY-011946] [InnoDB] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-07-12T07:31:13.692404Z 1 [Note] [MY-010006] [Server] Using data dictionary with version '80011'. 2018-07-12T07:31:13.699461Z 0 [Note] [MY-011320] [Server] Plugin mysqlx reported: 'Scheduler "work" started.' 2018-07-12T07:31:13.699526Z 0 [Note] [MY-011320] [Server] Plugin mysqlx reported: 'Scheduler "network" started.' 2018-07-12T07:31:13.753675Z 0 [Note] [MY-012487] [InnoDB] InnoDB: DDL log recovery : begin 2018-07-12T07:31:13.753781Z 0 [Note] [MY-012488] [InnoDB] InnoDB: DDL log recovery : end 2018-07-12T07:31:13.754144Z 0 [Note] [MY-012922] [InnoDB] InnoDB: Waiting for purge to start 2018-07-12T07:31:13.875925Z 0 [Note] [MY-010182] [Server] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them. 2018-07-12T07:31:13.875976Z 0 [Note] [MY-010304] [Server] Skipping generation of SSL certificates as certificate files are present in data directory. 2018-07-12T07:31:13.887261Z 0 [Note] [MY-010308] [Server] Skipping generation of RSA key pair through --sha256_password_auto_generate_rsa_keys as key files are present in data directory. 2018-07-12T07:31:13.886973Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2018-07-12T07:31:13.887287Z 0 [Note] [MY-010308] [Server] Skipping generation of RSA key pair through --caching_sha2_password_auto_generate_rsa_keys as key files are present in data directory. 2018-07-12T07:31:13.893160Z 0 [Note] [MY-010252] [Server] Server hostname (bind-address): '*'; port: 3306 2018-07-12T07:31:13.893264Z 0 [Note] [MY-010264] [Server] - '::' resolves to '::'; 2018-07-12T07:31:13.893243Z 0 [Note] [MY-010253] [Server] IPv6 is available. 2018-07-12T07:31:13.893346Z 0 [Note] [MY-010251] [Server] Server socket created on IP: '::'. 2018-07-12T07:31:13.935640Z 0 [Warning] [MY-010323] [Server] 'db' entry 'performance_schema mysql.session@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935596Z 0 [Warning] [MY-010315] [Server] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935578Z 0 [Warning] [MY-010315] [Server] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935696Z 0 [Warning] [MY-010311] [Server] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935560Z 0 [Warning] [MY-010315] [Server] 'user' entry 'mysql.session@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935664Z 0 [Warning] [MY-010323] [Server] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935537Z 0 [Warning] [MY-010315] [Server] 'user' entry 'mysql.infoschema@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.935475Z 0 [Warning] [MY-010315] [Server] 'user' entry 'healthchecker@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.943496Z 0 [Warning] [MY-010330] [Server] 'tables_priv' entry 'user mysql.session@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.943538Z 0 [Warning] [MY-010330] [Server] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode. 2018-07-12T07:31:13.981388Z 4 [Note] [MY-010051] [Server] Event Scheduler: scheduler thread started with id 4 2018-07-12T07:31:13.981809Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.11' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server - GPL. 2018-07-12T07:31:14.201131Z 0 [Note] [MY-011240] [Server] Plugin mysqlx reported: 'Using SSL configuration from MySQL Server' 2018-07-12T07:31:14.202628Z 0 [Note] [MY-011332] [Server] Plugin mysqlx reported: 'IPv6 is available' 2018-07-12T07:31:14.202531Z 0 [Note] [MY-011243] [Server] Plugin mysqlx reported: 'Using OpenSSL for TLS connections' 2018-07-12T07:31:14.211949Z 0 [Note] [MY-011323] [Server] Plugin mysqlx reported: 'X Plugin listens on UNIX socket (/var/run/mysqld/mysqlx.sock)' 2018-07-12T07:31:14.211880Z 0 [Note] [MY-011323] [Server] Plugin mysqlx reported: 'X Plugin listens on TCP (bind-address:'::', port:33060)' 2018-07-12T07:31:14.212024Z 0 [Note] [MY-011328] [Server] Plugin mysqlx reported: 'Server starts handling incoming connections' 2018-07-12T07:31:14.566434Z 0 [Note] [MY-011946] [InnoDB] InnoDB: Buffer pool(s) load completed at 180712 7:31:14 2018-07-12T07:31:18.379108Z 16 [ERROR] [MY-011660] [Repl] Plugin group_replication reported: 'Unable to start Group Replication on boot' 2018-07-12T07:31:18.379068Z 16 [ERROR] [MY-011685] [Repl] Plugin group_replication reported: 'The group name option is mandatory' 2018-07-12T07:31:18.611840Z 16 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2018-07-12T07:31:18.767225Z 16 [Note] [MY-011716] [Repl] Plugin group_replication reported: 'Current debug options are: 'GCS_DEBUG_NONE'.' 2018-07-12T07:31:18.768541Z 16 [Note] [MY-011671] [Repl] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED"; server_key_file: "server-key.pem"; server_cert_file: "server-cert.pem"; client_key_file: "server-key.pem"; client_cert_file: "server-cert.pem"; ca_file: "ca.pem"; ca_path: ""; cipher: ""; tls_version: "TLSv1,TLSv1.1,TLSv1.2"; crl_file: ""; crl_path: ""; ssl_fips_mode: "OFF"' 2018-07-12T07:31:18.775575Z 16 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Debug messages will be sent to: asynchronous::/var/lib/mysql/GCS_DEBUG_TRACE' 2018-07-12T07:31:18.778232Z 16 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Translated 'mysql-sp-cluster-0.mysql-sp-cluster' to 10.1.114.80' 2018-07-12T07:31:18.778658Z 16 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.' 2018-07-12T07:31:18.785124Z 16 [Note] [MY-011643] [Repl] Plugin group_replication reported: 'Member configuration: member_id: 1000; member_uuid: "752dcc20-85a5-11e8-82e2-4e4d13528e27"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; ' 2018-07-12T07:31:18.784942Z 16 [Note] [MY-011694] [Repl] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: '91079683-85a5-11e8-b393-4e4d13528e27'; group_replication_local_address: 'mysql-sp-cluster-0.mysql-sp-cluster:33061'; group_replication_group_seeds: ''; group_replication_bootstrap_group: 'true'; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: '10.0.0.0/8'; group_replication_communication_debug_options: 'GCS_DEBUG_NONE'' 2018-07-12T07:31:18.894429Z 18 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. 2018-07-12T07:31:19.073820Z 16 [Note] [MY-011670] [Repl] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2018-07-12T07:31:19.076308Z 21 [Note] [MY-010581] [Repl] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './mysql-sp-cluster-0-relay-bin-group_replication_applier.000001' position: 4 2018-07-12T07:31:19.097310Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] XCom initialized and ready to accept incoming connections on port 33061' 2018-07-12T07:31:19.097267Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] XCom protocol version: 3' 2018-07-12T07:31:19.277600Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Creating new server node 0 host mysql-sp-cluster-0.mysql-sp-cluster' 2018-07-12T07:31:19.278161Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={541bc8f4 0 0} boot_key={541bc8f4 0 0} node 0' 2018-07-12T07:31:19.279270Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={541bc8f4 1 0} boot_key={541bc8f4 1 0} node 0' 2018-07-12T07:31:19.279225Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Re-using server node 0 host mysql-sp-cluster-0.mysql-sp-cluster' 2018-07-12T07:31:22.281318Z 24 [Note] [MY-011623] [Repl] Plugin group_replication reported: 'Only one server alive. Declaring this server as online within the replication group' 2018-07-12T07:31:22.283647Z 24 [ERROR] [MY-011625] [Repl] Plugin group_replication reported: 'Unable to ensure the execution of group transactions received during recovery.' 2018-07-12T07:31:22.283486Z 21 [ERROR] [MY-013121] [Repl] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: MY-013121 2018-07-12T07:31:22.283670Z 24 [ERROR] [MY-011620] [Repl] Plugin group_replication reported: 'Fatal error during the recovery process of Group Replication. The server will leave the group.' 2018-07-12T07:31:22.283412Z 21 [ERROR] [MY-010834] [Server] next log error: -1 offset: 128 log: ./mysql-sp-cluster-0-relay-bin-group_replication_applier.000002 included: 0 2018-07-12T07:31:22.283445Z 21 [ERROR] [MY-010596] [Repl] Error reading relay log event for channel 'group_replication_applier': Error purging processed logs 2018-07-12T07:31:22.283690Z 24 [Warning] [MY-011645] [Repl] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.' 2018-07-12T07:31:22.283581Z 18 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.' 2018-07-12T07:31:22.283533Z 21 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.' 2018-07-12T07:31:22.283510Z 21 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0 2018-07-12T07:31:22.283623Z 18 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' 2018-07-12T07:31:22.284047Z 18 [Note] [MY-011444] [Repl] Plugin group_replication reported: 'The group replication applier thread was killed.' 2018-07-12T07:31:23.282029Z 0 [Note] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to on view 15313806812795849:1.' 2018-07-12T07:31:23.282458Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installed site start={541bc8f4 25 0} boot_key={541bc8f4 14 0} node 4294967295' 2018-07-12T07:31:35.282420Z 0 [Note] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

My personal opinion is, still suspect the different configuration is the root cause. It seems once volumeClaimTemplate is configured, the group_replication_bootstrap_group is set to ON (instead of OFF), and group_replication_group_seeds is left blank (instead of proper values).

By the way, for the known limitations page you gave. I am sorry I could not identify anything related. As my knowledge is limited on this, please kindly specify which part you referred to.

Big thanks! Eric

prydie commented 6 years ago

Apologies for the long delay in getting back to you @zqkou. I don't have a GlusterFS environment to test with and haven't been able to replicate the issue with various other types of persistent storage. I wonder if its something to do with the underlying storage?

zqkou commented 6 years ago

@prydie Thanks for replying.

No I don't think there is any issue with storage. As I mentioned above (twice), when PV is configured, the MySQL startup configuration is changed, which leads to startup failure. Could you please look into source code to locate the problem?

With PV: group_replication_bootstrap_group: ON group_replication_group_seeds:

Without PV: group_replication_bootstrap_group: OFF group_replication_group_seeds: mysql-multiprimary-cluster-1.mysql-multiprimary-cluster:33061,mysql-multiprimary-cluster-2.mysql-multiprimary-cluster:33061

welyss commented 5 years ago

same to me, version i used is release 0.3.0, storageclass is cephrbd. if i take off the conf of volumeClaimTemplate, it works well. if create with conf of volumeClaimTemplate, it starts only one pod, and on mysql container' log: it stops at:

2019-01-10T02:47:34.240339Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.12' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server - GPL. 2019-01-10T02:47:34.359076Z 0 [Note] [MY-011240] [Server] Plugin mysqlx reported: 'Using SSL configuration from MySQL Server' 2019-01-10T02:47:34.359731Z 0 [Note] [MY-011243] [Server] Plugin mysqlx reported: 'Using OpenSSL for TLS connections' 2019-01-10T02:47:34.359810Z 0 [Note] [MY-011332] [Server] Plugin mysqlx reported: 'IPv6 is available' 2019-01-10T02:47:34.361218Z 0 [Note] [MY-011323] [Server] Plugin mysqlx reported: 'X Plugin listens on TCP (bind-address:'::', port:33060)' 2019-01-10T02:47:34.361259Z 0 [Note] [MY-011323] [Server] Plugin mysqlx reported: 'X Plugin listens on UNIX socket (/var/run/mysqld/mysqlx.sock)' 2019-01-10T02:47:34.361347Z 0 [Note] [MY-011328] [Server] Plugin mysqlx reported: 'Server starts handling incoming connections' 2019-01-10T02:47:42.712581Z 8 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:47:52.757765Z 9 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:02.805622Z 10 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:12.853470Z 11 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:22.897566Z 12 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:32.943993Z 13 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:42.990463Z 14 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:48:53.038844Z 15 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES) 2019-01-10T02:49:03.082507Z 16 [Note] [MY-010926] [Server] Access denied for user 'root'@'::1' (using password: YES)

i checked out that the secret of rootpassword has been created, and env MYSQL_ROOT_PASSWORD in mysql's container has been loaded.