manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.96k stars 498 forks source link

Replication cluster. GTID ERROR. #386

Open donbing007 opened 4 years ago

donbing007 commented 4 years ago
| version                          | 3.5.0 1d34c491@200722 release |
| mysql_version                    | 5.0.37                        |
| cluster_default_protocol_version | 9          

There are a total of 10 nodes in my replication cluster, numbered from 0 to 9. They are all started with the seed node of 0.

Log of a seed node.

WARNING: '10.32.252.15:9312': connect timed out
WARNING: last message repeated 1 times
WARNING: 1.0 (daemon_17_default): State transfer to 0.0 (daemon_18_default) failed: -125 (Operation canceled)
WARNING: '10.32.252.15:9312': connect timed out
WARNING: last message repeated 1 times
WARNING: 1.0 (daemon_17_default): State transfer to 0.0 (daemon_19_default) failed: -125 (Operation canceled)
WARNING: '10.32.252.15:9312': connect timed out
WARNING: last message repeated 1 times
WARNING: '10.32.252.15:9312': remote error: invalid GTID, (null)
WARNING: last message repeated 1 times
WARNING: 1.0 (daemon_17_default): State transfer to 0.0 (daemon_19_default) failed: -125 (Operation canceled)
WARNING: SYNC message from member 0 in non-primary configuration. Ignored.

Follow the log of the node.

WARNING: Could not open state file for reading: '/var/lib/manticore/grastate.dat'
WARNING: No persistent state found. Bootstraping with default state
WARNING: Fail to access the file (/var/lib/manticore/gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
FATAL: invalid GTID, (null)
ERROR 1064 (42000) at line 1: (null)

Now this node is not going to synchronize to anything.

How do I restore GTID? Is it due to network timeout in the log?

tomatolog commented 4 years ago

what operation did you do? In case of JOIN cluster you need to look to log at donor node for errors.

donbing007 commented 4 years ago

The first is the log of Donor.

I restarted all the nodes and found that the original indexes in the cluster were no longer in the cluster.

I rejoined the indexes to the cluster.

alter cluster default add oqs0;

When I restart it again, I find that one of these nodes cannot synchronize. GTID error.

Seed nodes are the last to close and the first to start, and start with the "-- new-cluster" parameter.

How do I recover this node?

tomatolog commented 4 years ago

need to look at show status output of nodes in cluster and one that failed to join.

Also need at full logs of nodes as I see no timestamps and it is not clear the flow of commands and error messages.

-- new-cluster option should be also used only for one node and you said that used for

Seed nodes ...

In case show status will show that cluster is alive you might shutdown joiner node remove cluster folder and start daemon again and join cluster again. Or provide logs from joiner node and donor node to check the flow.

donbing007 commented 4 years ago

This is the node "show status" that cannot be synchronized.

+--------------------------------------------+--------------------------------------+
| Counter                                    | Value                                |
+--------------------------------------------+--------------------------------------+
| uptime                                     | 86530                                |
| connections                                | 1151502                              |
| maxed_out                                  | 0                                    |
| version                                    | 3.5.0 1d34c491@200722 release        |
| mysql_version                              | 5.0.37                               |
| command_search                             | 2349889                              |
| command_excerpt                            | 0                                    |
| command_update                             | 0                                    |
| command_delete                             | 0                                    |
| command_keywords                           | 0                                    |
| command_persist                            | 60772                                |
| command_status                             | 13731                                |
| command_flushattrs                         | 0                                    |
| command_set                                | 0                                    |
| command_insert                             | 0                                    |
| command_replace                            | 0                                    |
| command_commit                             | 0                                    |
| command_suggest                            | 0                                    |
| command_json                               | 0                                    |
| command_callpq                             | 0                                    |
| agent_connect                              | 0                                    |
| agent_retry                                | 0                                    |
| queries                                    | 2349889                              |
| dist_queries                               | 0                                    |
| workers_total                              | 6                                    |
| workers_active                             | 3                                    |
| workers_clients                            | 1                                    |
| work_queue_length                          | 3                                    |
| query_wall                                 | 3993.817                             |
| query_cpu                                  | OFF                                  |
| dist_wall                                  | 0.000                                |
| dist_local                                 | 0.000                                |
| dist_wait                                  | 0.000                                |
| query_reads                                | OFF                                  |
| query_readkb                               | OFF                                  |
| query_readtime                             | OFF                                  |
| avg_query_wall                             | 0.001                                |
| avg_query_cpu                              | OFF                                  |
| avg_dist_wall                              | 0.000                                |
| avg_dist_local                             | 0.000                                |
| avg_dist_wait                              | 0.000                                |
| avg_query_reads                            | OFF                                  |
| avg_query_readkb                           | OFF                                  |
| avg_query_readtime                         | OFF                                  |
| qcache_max_bytes                           | 0                                    |
| qcache_thresh_msec                         | 3000                                 |
| qcache_ttl_sec                             | 60                                   |
| qcache_cached_queries                      | 0                                    |
| qcache_used_bytes                          | 0                                    |
| qcache_hits                                | 0                                    |
| cluster_name                               | default                              |
| cluster_default_state_uuid                 | 00000000-0000-0000-0000-000000000000 |
| cluster_default_conf_id                    | -1                                   |
| cluster_default_status                     | non-primary                          |
| cluster_default_size                       | 0                                    |
| cluster_default_local_index                | -1                                   |
| cluster_default_node_state                 | destroyed                            |
| cluster_default_nodes_set                  | ptt-xck-manticore-index-0-svc:9312   |
| cluster_default_nodes_view                 |                                      |
| cluster_default_indexes_count              | 0                                    |
| cluster_default_indexes                    |                                      |
| cluster_default_local_state_uuid           |                                      |
| cluster_default_protocol_version           | 9                                    |
| cluster_default_last_applied               | -1                                   |
| cluster_default_last_committed             | -1                                   |
| cluster_default_replicated                 | 0                                    |
| cluster_default_replicated_bytes           | 0                                    |
| cluster_default_repl_keys                  | 0                                    |
| cluster_default_repl_keys_bytes            | 0                                    |
| cluster_default_repl_data_bytes            | 0                                    |
| cluster_default_repl_other_bytes           | 0                                    |
| cluster_default_received                   | 3                                    |
| cluster_default_received_bytes             | 1934                                 |
| cluster_default_local_commits              | 0                                    |
| cluster_default_local_cert_failures        | 0                                    |
| cluster_default_local_replays              | 0                                    |
| cluster_default_local_send_queue           | 0                                    |
| cluster_default_local_send_queue_max       | 1                                    |
| cluster_default_local_send_queue_min       | 0                                    |
| cluster_default_local_send_queue_avg       | 0.000000                             |
| cluster_default_local_recv_queue           | 0                                    |
| cluster_default_local_recv_queue_max       | 2                                    |
| cluster_default_local_recv_queue_min       | 0                                    |
| cluster_default_local_recv_queue_avg       | 0.333333                             |
| cluster_default_local_cached_downto        | 0                                    |
| cluster_default_flow_control_paused_ns     | 0                                    |
| cluster_default_flow_control_paused        | 0.000000                             |
| cluster_default_flow_control_sent          | 0                                    |
| cluster_default_flow_control_recv          | 0                                    |
| cluster_default_flow_control_interval      | [ 0, 0 ]                             |
| cluster_default_flow_control_interval_low  | 0                                    |
| cluster_default_flow_control_interval_high | 0                                    |
| cluster_default_flow_control_status        | OFF                                  |
| cluster_default_cert_deps_distance         | 0.000000                             |
| cluster_default_apply_oooe                 | 0.000000                             |
| cluster_default_apply_oool                 | 0.000000                             |
| cluster_default_apply_window               | 0.000000                             |
| cluster_default_commit_oooe                | 0.000000                             |
| cluster_default_commit_oool                | 0.000000                             |
| cluster_default_commit_window              | 0.000000                             |
| cluster_default_local_state                | 0                                    |
| cluster_default_local_state_comment        | Initialized                          |
| cluster_default_cert_index_size            | 0                                    |
| cluster_default_cert_bucket_count          | 2                                    |
| cluster_default_gcache_pool_size           | 1440                                 |
| cluster_default_causal_reads               | 0                                    |
| cluster_default_cert_interval              | 0.000000                             |
| cluster_default_open_transactions          | 0                                    |
| cluster_default_open_connections           | 0                                    |
| cluster_default_ist_receive_status         |                                      |
| cluster_default_ist_receive_seqno_start    | 0                                    |
| cluster_default_ist_receive_seqno_current  | 0                                    |
| cluster_default_ist_receive_seqno_end      | 0                                    |
| cluster_default_incoming_addresses         |                                      |
+--------------------------------------------+--------------------------------------+

This is the full log of this node.

Manticore 3.5.0 1d34c491@200722 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2020, Manticore Software LTD (http://manticoresearch.com)

precaching index 'oqsindex0'
It hasn't started yet. Check again in 3 seconds.
precaching index 'oqsindex1'
precaching index 'oqsindex2'
precaching index 'oqsindex3'
precaching index 'test'
binlog: replaying log /var/lib/manticore/binlog/binlog.001
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.001; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.002
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.002; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.003
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.003; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.004
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.004; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.005
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.005; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.006
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.006; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.007
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.007; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.008
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.008; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.009
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.009; 0.0 MB in 0.000 sec
binlog: replaying log /var/lib/manticore/binlog/binlog.010
binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
binlog: finished replaying /var/lib/manticore/binlog/binlog.010; 0.0 MB in 0.000 sec
binlog: finished replaying total 10 in 0.000 sec
prereading 5 indexes
prereaded 5 indexes in 0.000 sec
WARNING: '10.32.79.84:9312': connect timed out
WARNING: '10.32.213.244:9312': connect timed out
WARNING: '10.32.62.155:9312': connect timed out
WARNING: '10.32.60.152:9312': connect timed out
WARNING: '10.32.134.5:9312': connect timed out
WARNING: '10.32.165.236:9312': connect timed out
WARNING: '10.32.248.74:9312': connect timed out
WARNING: cluster 'default': no available nodes, replication is disabled, error: '10.32.79.84:9312': connect timed out;'10.32.213.244:9312': connect timed out;'10.32.62.155:9312': connect timed out;'10.32.60.152:9312': connect timed out;'10.32.134.5:9312': connect timed out;'10.32.165.236:9312': connect timed out;'10.32.248.74:9312': connect timed out
accepting connections
WARNING: Could not open state file for reading: '/var/lib/manticore/grastate.dat'
WARNING: No persistent state found. Bootstraping with default state
WARNING: Fail to access the file (/var/lib/manticore/gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
FATAL: invalid GTID, (null)
ERROR 1064 (42000) at line 1: (null)

This is a normal seed.

+--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Counter                                    | Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uptime                                     | 90381                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| connections                                | 3561306                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| maxed_out                                  | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| version                                    | 3.5.0 1d34c491@200722 release                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| mysql_version                              | 5.0.37                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| command_search                             | 6251706                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| command_excerpt                            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_update                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_delete                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_keywords                           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_persist                            | 69971                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| command_status                             | 18080                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| command_flushattrs                         | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_set                                | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_insert                             | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_replace                            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_commit                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_suggest                            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_json                               | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| command_callpq                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| agent_connect                              | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| agent_retry                                | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| queries                                    | 6251706                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| dist_queries                               | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| workers_total                              | 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| workers_active                             | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| workers_clients                            | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| work_queue_length                          | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| query_wall                                 | 20834.195                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| query_cpu                                  | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| dist_wall                                  | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| dist_local                                 | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| dist_wait                                  | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| query_reads                                | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| query_readkb                               | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| query_readtime                             | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| avg_query_wall                             | 0.003                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| avg_query_cpu                              | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| avg_dist_wall                              | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| avg_dist_local                             | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| avg_dist_wait                              | 0.000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| avg_query_reads                            | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| avg_query_readkb                           | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| avg_query_readtime                         | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| qcache_max_bytes                           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| qcache_thresh_msec                         | 3000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| qcache_ttl_sec                             | 60                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| qcache_cached_queries                      | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| qcache_used_bytes                          | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| qcache_hits                                | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_name                               | default                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| cluster_default_state_uuid                 | 98dd509d-dc65-11ea-9bd7-2267785d627a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| cluster_default_conf_id                    | 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| cluster_default_status                     | primary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| cluster_default_size                       | 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_index                | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_node_state                 | synced                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| cluster_default_nodes_set                  | 10.32.235.36:9312,10.32.54.155:9312,10.32.94.169:9312,10.32.106.36:9312,10.32.64.108:9312,10.32.241.100:9312,10.32.218.205:9312,10.32.154.189:9312,10.32.180.154:9312,10.32.162.137:9312                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_nodes_view                 | ptt-xck-manticore-index-5-svc:9312,ptt-xck-manticore-index-5-svc:9360:replication,ptt-xck-manticore-index-4-svc:9312,ptt-xck-manticore-index-4-svc:9360:replication,ptt-xck-manticore-index-1-svc:9312,ptt-xck-manticore-index-1-svc:9360:replication,ptt-xck-manticore-index-2-svc:9312,ptt-xck-manticore-index-2-svc:9360:replication,ptt-xck-manticore-index-9-svc:9312,ptt-xck-manticore-index-9-svc:9360:replication,ptt-xck-manticore-index-8-svc:9312,ptt-xck-manticore-index-8-svc:9360:replication,ptt-xck-manticore-index-7-svc:9312,ptt-xck-manticore-index-7-svc:9360:replication |
| cluster_default_indexes_count              | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_indexes                    | oqsindex0,oqsindex1,oqsindex2,oqsindex3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| cluster_default_local_state_uuid           | 98dd509d-dc65-11ea-9bd7-2267785d627a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| cluster_default_protocol_version           | 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_last_applied               | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_last_committed             | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_replicated                 | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_replicated_bytes           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_repl_keys                  | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_repl_keys_bytes            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_repl_data_bytes            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_repl_other_bytes           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_received                   | 35                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| cluster_default_received_bytes             | 41729                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| cluster_default_local_commits              | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_cert_failures        | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_replays              | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_send_queue           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_send_queue_max       | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_send_queue_min       | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_send_queue_avg       | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_local_recv_queue           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_recv_queue_max       | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_recv_queue_min       | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_recv_queue_avg       | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_local_cached_downto        | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_flow_control_paused_ns     | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_flow_control_paused        | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_flow_control_sent          | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_flow_control_recv          | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_flow_control_interval      | [ 265, 265 ]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| cluster_default_flow_control_interval_low  | 265                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| cluster_default_flow_control_interval_high | 265                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| cluster_default_flow_control_status        | OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| cluster_default_cert_deps_distance         | 1.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_apply_oooe                 | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_apply_oool                 | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_apply_window               | 1.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_commit_oooe                | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_commit_oool                | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_commit_window              | 1.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_local_state                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_local_state_comment        | Synced                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| cluster_default_cert_index_size            | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_cert_bucket_count          | 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_gcache_pool_size           | 5152                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| cluster_default_causal_reads               | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_cert_interval              | 0.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_open_transactions          | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_open_connections           | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_ist_receive_status         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| cluster_default_ist_receive_seqno_start    | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_ist_receive_seqno_current  | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_ist_receive_seqno_end      | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_incoming_addresses         | ptt-xck-manticore-index-5-svc:9312,ptt-xck-manticore-index-5-svc:9360:replication,ptt-xck-manticore-index-4-svc:9312,ptt-xck-manticore-index-4-svc:9360:replication,ptt-xck-manticore-index-1-svc:9312,ptt-xck-manticore-index-1-svc:9360:replication,ptt-xck-manticore-index-2-svc:9312,ptt-xck-manticore-index-2-svc:9360:replication,ptt-xck-manticore-index-9-svc:9312,ptt-xck-manticore-index-9-svc:9360:replication,ptt-xck-manticore-index-8-svc:9312,ptt-xck-manticore-index-8-svc:9360:replication,ptt-xck-manticore-index-7-svc:9312,ptt-xck-manticore-index-7-svc:9360:replication |
| cluster_default_cluster_weight             | 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_desync_count               | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| cluster_default_evs_delayed                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| cluster_default_evs_evict_list             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| cluster_default_evs_repl_latency           | 0/0/0/0/0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| cluster_default_evs_state                  | OPERATIONAL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| cluster_default_gcomm_uuid                 | e6c97c82-dc65-11ea-8314-37b66d4567d5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The log is a network timeout problem.

I just have to log off and clean it up.

tomatolog commented 4 years ago

it is not clear - that donor node where cluster works has these addresses

| cluster_default_nodes_set                  | 10.32.235.36:9312,10.32.54.155:9312,10.32.94.169:9312,10.32.106.36:9312,10.32.64.108:9312,10.32.241.100:9312,10.32.218.205:9312,10.32.154.189:9312,10.32.180.154:9312,10.32.162.137:9312                                                                                                                                                                                                                                                                                                                                                                                                      |
| cluster_default_nodes_view                 | ptt-xck-manticore-index-5-svc:9312,ptt-xck-manticore-index-5-svc:9360:replication,ptt-xck-manticore-index-4-svc:9312,ptt-xck-manticore-index-4-svc:9360:replication,ptt-xck-manticore-index-1-svc:9312,ptt-xck-manticore-index-1-svc:9360:replication,ptt-xck-manticore-index-2-svc:9312,ptt-xck-manticore-index-2-svc:9360:replication,ptt-xck-manticore-index-9-svc:9312,ptt-xck-manticore-index-9-svc:9360:replication,ptt-xck-manticore-index-8-svc:9312,ptt-xck-manticore-index-8-svc:9360:replication,ptt-xck-manticore-index-7-svc:9312,ptt-xck-manticore-index-7-svc:9360:replication |
| cluster_default_incoming_addresses         | ptt-xck-manticore-index-5-svc:9312,ptt-xck-manticore-index-5-svc:9360:replication,ptt-xck-manticore-index-4-svc:9312,ptt-xck-manticore-index-4-svc:9360:replication,ptt-xck-manticore-index-1-svc:9312,ptt-xck-manticore-index-1-svc:9360:replication,ptt-xck-manticore-index-2-svc:9312,ptt-xck-manticore-index-2-svc:9360:replication,ptt-xck-manticore-index-9-svc:9312,ptt-xck-manticore-index-9-svc:9360:replication,ptt-xck-manticore-index-8-svc:9312,ptt-xck-manticore-index-8-svc:9360:replication,ptt-xck-manticore-index-7-svc:9312,ptt-xck-manticore-index-7-svc:9360:replication |

however the joiner node has that error message

WARNING: '10.32.252.15:9312': connect timed out

and I see no such IP 10.32.252.15:9312 at donor node.

Could you show joiner node searchd config and manticore.json file from the dat dir? Maybe nodes IPs got changed after daemons restart? Do you use Kubernates environment?

To fix joiner node that failed you could clean up cluster directory at data directory and issue JOIN from the ground - it will fully sync data again from one of the donor nodes.

donbing007 commented 4 years ago

It would be nice if could provide a command to easily restart synchronization.

Yeah, I'm using kubernates.

At the same time, 10 nodes were created based on StatfulSet and started successively.

The port 9312 connection timeout was strange. I had 6 nodes deployed the Distributed index and connected 10 nodes using the Agent. Port 9312 is also used, which is normal.

I've created an SVC for each node, which is a fixed domain name. This name is then specified for each node using node_address.

common {
  on_json_attr_error = fail_index
}
searchd {
  pid_file = /var/run/manticore/searchd.pid
  data_dir = /var/lib/manticore
  binlog_path = /var/lib/manticore/binlog
  node_address = ptt-xck-manticore-index-6-svc
  sphinxql_state = /var/lib/manticore/state.sql
  listen = 9312
  listen = 9306:mysql
  listen = 9308:http
  listen = ptt-xck-manticore-index-6-svc:9360-9370:replication
  client_timeout = 1h
  mysql_version_string = 5.0.37
  network_timeout = 30s
  qcache_max_bytes = 0
  query_log_format = sphinxql
  query_log_min_msec = 2000
  rt_flush_period = 30m
  shutdown_timeout = 30m
  sphinxql_timeout = 8h
  threads = 6
  watchdog = 0
}

All nodes use the same configuration, except for node_address.

# Determine whether the current node started successfully.
while :
do
  result=`mysql -h0 -P9306 -e "show status" | grep -c "mysql_version"`
  if [ $result -gt 0 ];then
    break
  else
    echo "It hasn't started yet. Check again in 3 seconds."
    sleep 3s
  fi
done
echo "The node started successfully."

echo "Start to determine if the primary node started successfully."
while :
do
  result=`mysql -hptt-xck-manticore-index-0-svc -P9306 -e "show status" | grep -c "mysql_version"`
  if [ $result -gt 0 ];then
    break
  else
    echo "The main node has not started successfully. Wait for 3 seconds to check again."
    sleep 3s
  fi
done

clusterName="default"
clusterNumber=`mysql -h0 -P9306 -e "show status like 'cluster_name'" | grep -c "cluster_name"`
if [ $clusterNumber -eq 0 ];then
  # Not in the cluster, join.
  mysql -h0 -P9306 -e "join cluster $clusterName at 'ptt-xck-manticore-index-0-svc:9312'"
else
  echo "Cluster already exists, ignore."
fi

This is a script that my startup automatically executes to join the current node into the cluster (if it's not joined)

githubmanticore commented 4 years ago

➤ Aleksey N. Vinogradov commented:

Consider also param hostname_lookup, if ip addresses of the hosts may change in runtime.

https://manual.manticoresearch.com/Creating_an_index/Creating_a_distributed_index/Remote_indexes#hostname_lookup

tomatolog commented 4 years ago

Could you show joiner node (one that failed) manticore.json file from the dat dir?

donbing007 commented 4 years ago

Is that it?

# cat manticore.json
{
    "clusters": {
        "default":  {
            "nodes":    "10.32.79.84:9312,10.32.213.244:9312,10.32.62.155:9312,10.32.60.152:9312,10.32.134.5:9312,10.32.165.236:9312,10.32.248.74:9312",
            "options":  "",
            "indexes":  ["test"]
        }
    },
    "indexes":  {
        "oqsindex0":    {
            "type": "rt",
            "path": "oqsindex0"
        },
        "oqsindex1":    {
            "type": "rt",
            "path": "oqsindex1"
        },
        "oqsindex2":    {
            "type": "rt",
            "path": "oqsindex2"
        },
        "oqsindex3":    {
            "type": "rt",
            "path": "oqsindex3"
        },
        "test": {
            "type": "rt",
            "path": "test"
        },
        "oqsindex": {
            "type": "distributed",
            "locals":   ["oqsindex0", "oqsindex1", "oqsindex2", "oqsindex3"],
            "agent_connect_timeout":    1000,
            "agent_query_timeout":  3000,
            "divide_remote_ranges": false,
            "ha_strategy":  "random"
        }
    }
}
donbing007 commented 4 years ago

➤ Aleksey N. Vinogradov commented:

Consider also param hostname_lookup, if ip addresses of the hosts may change in runtime.

https://manual.manticoresearch.com/Creating_an_index/Creating_a_distributed_index/Remote_indexes#hostname_lookup

The network name and IP generally remain stable, which is also the purpose of using Kubernetes SVC. It's just that the routing POD node of this network may change the IP, such as restarting or being balanced to other Kubernetes worker nodes, etc.

sanikolaev commented 4 years ago

@donbing007 is this issue still actual?

donbing007 commented 4 years ago

This did not happen for a while, and I tried to create the cluster manually rather than automatically.