sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6k stars 977 forks source link

Possible Race Condition within Galera while deploying (openstack) #1873

Open dgttm opened 5 years ago

dgttm commented 5 years ago

This is a reproducible issue.

[ ] A clear description of your issue:

Current Setup:

3-Nodes Cluster Setup for an Openstack Installation with external Percona XtraDB Galera DB and ProxySQL

Connection flow: Users -> 3-Node-ProxySQL Cluster with read/write split -> 3-Node-Percona XTRA DB Cluster

We encounter the issue, when deploying openstack with kolla-ansible, the databases are set up via ansible, so its a 100% reproducible setup.

While deploying openstack (e.g. bootstrapping keystone), there are DB transactions executed which fail with an error like:

I don't know exactly what the database actions are, but what I see is the following:

  1. it is only happening, when we have an active 3-node ProxySQL Cluster with read-write split (1 writer, 3 readers). All users are transaction_persistent. It happens also, when I set multiplex to 0 on all rules.
  2. it is NOT happening, when we only have 1 writer and 1 reader in place (same node)
  3. it does NOT happen, if we just use Galera without ProxySQL
  4. it does NOT happen when using haproxy instead of ProxySQL

For us, it looks like there is kind of a race condition when doing the database transactions (something is read before its written or committed.. I'm not a DB admin, so sorry for not beeing more accurate).

we even disabled multiplexing, but still the same behaviour

ProxySQL >select multiplex from runtime_mysql_query_rules;
+-----------+
| multiplex |
+-----------+
| 0         |
| 0         |
| 0         |
+-----------+

[ ] The version of OS and ProxySQL

Description: Ubuntu 18.04.1 LTS

ii  proxysql                              1.4.12-1.2.bionic                 amd64        High performance MySQL proxy

ii  percona-release                       0.1-6.bionic                      all          Package to install Percona gpg key and APT repo
ii  percona-xtrabackup-24                 2.4.12-1.bionic                   amd64        Open source backup tool for InnoDB and XtraDB
ii  percona-xtradb-cluster-57             5.7.24-31.33-1.bionic             amd64        Percona XtraDB Cluster with Galera
ii  percona-xtradb-cluster-client-5.7     5.7.24-31.33-1.bionic             amd64        Percona XtraDB Cluster database client binaries
ii  percona-xtradb-cluster-common-5.7     5.7.24-31.33-1.bionic             amd64        Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii  percona-xtradb-cluster-server-5.7     5.7.24-31.33-1.bionic             amd64        Percona XtraDB Cluster database server binaries
ii  proxysql                              1.4.12-1.2.bionic                 amd64        High performance MySQL proxy

[ ] Every step to reproduce the issue

datadir = "/var/lib/proxysql"

admin_variables =
{
                                                        cluster_mysql_query_rules_diffs_before_sync = 3
                              mysql_ifaces = "0.0.0.0:6032"
                              cluster_mysql_servers_diffs_before_sync = 3
                              cluster_mysql_users_save_to_disk = "True"
                              cluster_mysql_query_rules_save_to_disk = "True"
                                    cluster_mysql_users_diffs_before_sync = 3
                                                      cluster_mysql_servers_save_to_disk = "True"
                              cluster_proxysql_servers_diffs_before_sync = 3
                                    cluster_check_interval_ms = 200
                                          cluster_username = "dev"
                              cluster_check_status_frequency = 100
                                          cluster_proxysql_servers_save_to_disk = "True"
            }

mysql_variables =
{
                    monitor_read_only_interval = 1500
                              default_query_delay = 0
                              connect_timeout_server = 3000
                              monitor_read_only_timeout = 500
                              default_sql_mode = "STRICT_TRANS_TABLES, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION
"
                              default_charset = "utf8"
                                                            query_retries_on_failure = 1
                                    ping_timeout_server = 500
                              interfaces = "0.0.0.0:3306"
                                    max_connections = 8192
                              connect_retries_on_failure = 10
                              default_schema = "information_schema"
                              poll_timeout = 2000
                                          monitor_username = "monitor"
                                    commands_stats = "True"
                              default_query_timeout = 300000
                                          sessions_sort = "True"
                              threads = 4
                  }

mysql_servers =
(
      {
                                                  comment = "READ"
                                                                  login_port = "6032"
                                                                  weight = 1000
                                                                                                      hostname = "node1"
                                                                  hostgroup = 20
                                                                  status = "OFFLINE_SOFT"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
    ,      {
                                                  comment = "WRITE"
                                                                  login_port = "6032"
                                                                  weight = 5000
                                                                                                      hostname = node2"
                                                                  hostgroup = 10
                                                                  status = "ONLINE"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
    ,      {
                                                  comment = "READ"
                                                                  login_port = "6032"
                                                                  weight = 1000
                                                                                                      hostname = "node2"
                                                                  hostgroup = 20
                                                                  status = "ONLINE"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
    ,      {
                                                  comment = "WRITE"
                                                                  login_port = "6032"
                                                                  weight = 1000
                                                                                                      hostname = "node3"
                                                                  hostgroup = 10
                                                                  status = "ONLINE"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
    ,      {
                                                  comment = "READ"
                                                                  login_port = "6032"
                                                                  weight = 1000
                                                                                                      hostname = "node3"
                                                                  hostgroup = 20
                                                                  status = "ONLINE"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
    ,      {
                                                  comment = "WRITE"
                                                                  login_port = "6032"
                                                                  weight = 10000
                                                                                                      hostname = "node1"
                                                                  hostgroup = 10
                                                                  status = "ONLINE"
                                                                                    max_connections = 2000
                                                                  port = 3307
                                                                  max_replication_lag = 0
                                        }
      )

mysql_users:
(
  )

mysql_query_rules:
(
      {
                                                  login_port = "6032"
                                                                  negate_match_pattern = 0
                                                                                                                        match_pattern = ".*@.*"
                                                                  destination_hostgroup = 10
                                                                  flagIN = 0
                                                                  active = 1
                                                                  apply = 1
                                                                  rule_id = 1
                                        }
    ,      {
                                                  login_port = "6032"
                                                                  negate_match_pattern = 0
                                                                                                                        match_pattern = "^SELECT.*FOR UPDATE"
                                                                  destination_hostgroup = 10
                                                                  flagIN = 0
                                                                  active = 1
                                                                  apply = 1
                                                                  rule_id = 2
                                        }
    ,      {
                                                  login_port = "6032"
                                                                  negate_match_pattern = 0
                                                                                                                        match_pattern = "^SELECT.*"
                                                                  destination_hostgroup = 20
                                                                  flagIN = 0
                                                                  active = 1
                                                                  apply = 1
                                                                  rule_id = 3
                                        }
      )

mysql_replication_hostgroups =
(
      {
                                                  comment = "Cluster"
                                                                  login_port = "6032"
                                                                  reader_hostgroup = 20
                                                                                                      writer_hostgroup = 10
                                                          }
      )

proxysql_servers =
(
      {
                                                  comment = "node1"
                                                                  login_port = "6032"
                                                                  weight = 1000
                                                                                                      hostname = "node1"
                                                          }
    ,      {
                                                  comment = "node3"
                                                                  login_port = "6032"
                                                                  weight = 800
                                                                                                      hostname = "node3"
                                                          }
    ,      {
                                                  comment = "node2"
                                                                  login_port = "6032"
                                                                  weight = 900
                                                                                                      hostname = "node2"
                                                          }
      )

scheduler =
(
)

results in:


Fri Jan 18 14:13:25 CET 2019 ###### proxysql_galera_checker.sh SUMMARY ######
Fri Jan 18 14:13:25 CET 2019 Hostgroup writers 10
Fri Jan 18 14:13:25 CET 2019 Hostgroup readers 20
Fri Jan 18 14:13:25 CET 2019 Number of writers 1
Fri Jan 18 14:13:25 CET 2019 Writers are readers 1
Fri Jan 18 14:13:25 CET 2019 log file /var/lib/proxysql/proxysql_galera_check.log
Fri Jan 18 14:13:25 CET 2019 ###### HANDLE WRITER NODES ######
Fri Jan 18 14:13:25 CET 2019 --> Checking WRITE server 10:node1:3307, current status ONLINE, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 10:node1:3307 is already ONLINE: 1 of 1 write nodes
Fri Jan 18 14:13:25 CET 2019 --> Checking WRITE server 10:node2:3307, current status OFFLINE_SOFT, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 10:node2:3307 is already OFFLINE_SOFT, max write nodes reached (1)
Fri Jan 18 14:13:25 CET 2019 --> Checking WRITE server 10:node2:3307, current status OFFLINE_SOFT, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 10:node3:3307 is already OFFLINE_SOFT, max write nodes reached (1)
Fri Jan 18 14:13:25 CET 2019 ###### HANDLE READER NODES ######
Fri Jan 18 14:13:25 CET 2019 --> Checking READ server 20:node1:3307, current status ONLINE, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 20:node1:3307 is already ONLINE
Fri Jan 18 14:13:25 CET 2019 --> Checking READ server 20:node2:3307, current status ONLINE, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 20:node2:3307 is already ONLINE
Fri Jan 18 14:13:25 CET 2019 --> Checking READ server 20:node3:3307, current status ONLINE, wsrep_local_state 4
Fri Jan 18 14:13:25 CET 2019 server 20:node3:3307 is already ONLINE
Fri Jan 18 14:13:25 CET 2019 ###### SUMMARY ######
Fri Jan 18 14:13:25 CET 2019 --> Number of writers that are 'ONLINE': 1 : hostgroup: 10
Fri Jan 18 14:13:25 CET 2019 --> Number of readers that are 'ONLINE': 3 : hostgroup: 20
Fri Jan 18 14:13:25 CET 2019 ###### Not loading mysql_servers, no change needed ######

[ ] The error log

I only found the following errors in the proxysql.log (with one deployment)

2019-01-18 13:34:46 MySQL_Session.cpp:2949:handler(): [WARNING] Error during query on (10,node1,3307): 1305, SAVEPOINT sa_savepoint_5 does not exist

Error Log from kolla-ansible when doing its bootstrapping

2019-01-18 12:21:35.246 23 INFO migrate.versioning.api [-] 36 -> 37... 
2019-01-18 12:21:35.247 23 CRITICAL keystone [-] Unhandled error: DBMigrationError: 35 is not 36
2019-01-18 12:21:35.247 23 ERROR keystone Traceback (most recent call last):
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/bin/keystone-manage", line 10, in <module>
2019-01-18 12:21:35.247 23 ERROR keystone     sys.exit(main())
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/cmd/manage.py", line 41, in main
2019-01-18 12:21:35.247 23 ERROR keystone     cli.main(argv=sys.argv, developer_config_file=developer_config)
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1216, in main
2019-01-18 12:21:35.247 23 ERROR keystone     CONF.command.cmd_class.main()
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/cmd/cli.py", line 328, in main
2019-01-18 12:21:35.247 23 ERROR keystone     CONF.command.version)
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 250, in offline_sync_database_to_version
2019-01-18 12:21:35.247 23 ERROR keystone     expand_schema()
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 325, in expand_schema
2019-01-18 12:21:35.247 23 ERROR keystone     _sync_repo(repo_name=EXPAND_REPO)
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/keystone/common/sql/upgrades.py", line 187, in _sync_repo
2019-01-18 12:21:35.247 23 ERROR keystone     init_version=init_version, sanity_check=False)
2019-01-18 12:21:35.247 23 ERROR keystone   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/migration.py", line 81, in db_sync
2019-01-18 12:21:35.247 23 ERROR keystone     raise exception.DBMigrationError(ex)
2019-01-18 12:21:35.247 23 ERROR keystone DBMigrationError: 35 is not 36
2019-01-18 12:21:35.247 23 ERROR keystone 

Thank you very much!!

dgttm commented 5 years ago

UPDATE: as a workaround, once the rewrite engine is disabled (fast_forward=1) the issue is gone.

In the docker logs I found the a duplicate index warning. could this maybe lead to the described issue?

+++ nova-manage db sync
/var/lib/kolla/venv/local/lib/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/cursors.py:170: Warning: (1831, u"Duplicate index 'block_device_mapping_instance_uuid_virtual_name_device_name_idx' defined on the table 'nova_cell0.block_device_mapping'. This is deprecated and will be disallowed in a future release.")
  result = self._query(query)
/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/cursors.py:170: Warning: (1831, u"Duplicate index 'uniq_instances0uuid' defined on the table 'nova_cell0.instances'. This is deprecated and will be disallowed in a future release.")
  result = self._query(query)
liqiang-fit2cloud commented 5 years ago

Encountering same issue.