Empty cluster got lost on restart

klirichek commented 8 months ago

Describe the bug Empty cluster doesn't survive restart of the daemon

To Reproduce Steps to reproduce the behavior:

start the daemon
issue create cluster 'c'
stop the daemon. Ensure, cluster is present in internal json config in datadir
start daemon again.
ensure, no cluster now available.
stop the daemon. Ensure, cluster vanished in internal json config.

Expected behavior State of clusters should be restored on restart. Notice, same vanishing happens when we join the cluster from another node. Despite the fact that cluster has no indexes on the moment on shutdown, that is not good that any info vanishes on restart, even if it is just a cluster's name.

UPDATE The behaviour is quite specific and locked to the case, when searchd starts second time not having galera library. That is: after cluster is created, and daemon stopped, it is second time started without galera, and as result - all related to cluster got lost in the config. I think, such edge-case behaviour is not so critical and ok (however, as proof-of-mistake it would be good to keep the data for future)

However, that is a bit more complicated scenario, where cluster dissapears:

1-st instance (in folder '1')

searchd {
    listen = 127.0.0.1:10201
    listen = 127.0.0.1:10301:mysql
    log = searchd.log
    query_log = query.log
    pid_file = searchd.pid
    data_dir = data
    binlog_path = 
}

2-nd instance (in folder '2')

searchd {
    listen = 127.0.0.1:20201
    listen = 127.0.0.1:20301:mysql
    log = searchd.log
    query_log = query.log
    pid_file = searchd.pid
    data_dir = data
    binlog_path =
}

file 1/test_funcs

cleanup() {
rm -rf ~/tests/repl/1/data/*
rm -rf ~/tests/repl/2/data/*
}

export GALERA_SONAME=/home/alexey/cache/linux-x86_64/galera/lib/libgalera_manticore.so

start() {
cd ~/tests/repl/1
./$daemon
cd ../2
./$daemon
sleep 1
}

create()
{
mysql -P10301 -h0 -e "create cluster c"
sleep 1
}

join()
{
mysql -P20301 -h0 -e "join cluster c at '127.0.0.1:10201'"
sleep 1
}

stop() {
cd ../1
./$daemon --stopwait
cd ../2
./$daemon --stopwait
}

dump() {
cd ~/tests/repl/1
cat data/manticore.json | jq
cd ../2
cat data/manticore.json | jq
cd -
}

MRE scenario:

#!/bin/bash

daemon=searchd

. test_funcs

cleanup
start
create
join
stop
dump
start
stop
dump

That produces output:

Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:14.606] [689848] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[24:14.608] [689848] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:10201 for sphinx and http(s)
listening on 127.0.0.1:10301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:14.723] [689870] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[24:14.725] [689870] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:20201 for sphinx and http(s)
listening on 127.0.0.1:20301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:18.067] [689906] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[24:18.068] [689906] stop: successfully sent SIGTERM to pid 689851
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:23.635] [689907] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[24:23.636] [689907] stop: successfully sent SIGTERM to pid 689873
{
  "clusters": {
    "c": {
      "nodes": "127.0.0.1:10201,127.0.0.1:20201",
      "options": "",
      "indexes": []
    }
  },
  "indexes": {}
}
{
  "clusters": {
    "c": {
      "nodes": "127.0.0.1:10201,127.0.0.1:20201",
      "options": "",
      "indexes": []
    }
  },
  "indexes": {}
}
/home/alexey/tests/repl/1
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:23.806] [689912] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[24:23.806] [689912] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:10201 for sphinx and http(s)
listening on 127.0.0.1:10301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:23.919] [689935] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[24:23.921] [689935] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:20201 for sphinx and http(s)
listening on 127.0.0.1:20301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:25.040] [689959] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[24:25.041] [689959] stop: successfully sent SIGTERM to pid 689915
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[24:25.057] [689960] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[24:25.059] [689960] stop: successfully sent SIGTERM to pid 689938
{
  "clusters": {},
  "indexes": {}
}
{
  "clusters": {},
  "indexes": {}
}
/home/alexey/tests/repl/1

That is - cluster dissapears from config. However, with another scenario without join, it is stored.

MRE scenario:

#!/bin/bash

daemon=searchd

. test_funcs

cleanup
start
create
#join
stop
dump
start
stop
dump

Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:54.437] [690574] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[47:54.439] [690574] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:10201 for sphinx and http(s)
listening on 127.0.0.1:10301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:54.554] [690596] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[47:54.556] [690596] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:20201 for sphinx and http(s)
listening on 127.0.0.1:20301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:56.802] [690625] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[47:56.803] [690625] stop: successfully sent SIGTERM to pid 690577
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:56.872] [690626] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[47:56.873] [690626] stop: successfully sent SIGTERM to pid 690599
{
  "clusters": {
    "c": {
      "options": "",
      "indexes": []
    }
  },
  "indexes": {}
}
{
  "clusters": {},
  "indexes": {}
}
/home/alexey/tests/repl/1
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:56.995] [690631] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[47:56.997] [690631] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:10201 for sphinx and http(s)
listening on 127.0.0.1:10301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:57.112] [690657] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[47:57.114] [690657] WARNING: secondary_indexes set but failed to initialize secondary library: (null)
starting daemon version '6.2.13 01c4e054a@231103 dev' ...
listening on 127.0.0.1:20201 for sphinx and http(s)
listening on 127.0.0.1:20301 for mysql
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:58.233] [690680] using config file '/home/alexey/tests/repl/1/manticore.conf' (238 chars)...
[47:58.234] [690680] stop: successfully sent SIGTERM to pid 690634
Manticore 6.2.13 01c4e054a@231103 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[47:58.302] [690681] using config file '/home/alexey/tests/repl/2/manticore.conf' (261 chars)...
[47:58.304] [690681] stop: successfully sent SIGTERM to pid 690660
{
  "clusters": {
    "c": {
      "options": "",
      "indexes": []
    }
  },
  "indexes": {}
}
{
  "clusters": {},
  "indexes": {}
}
/home/alexey/tests/repl/1

tomatolog commented 8 months ago

you need to start node with the --new-cluster daemon cli to make sure the node will be leader after full cluster restart as described at Restarting a cluster

sanikolaev commented 8 months ago

I can't reproduce it on my mac m1:

mysql> create cluster c;
--------------
create cluster c
--------------

ERROR 1064 (42000): failed to create desc: cluster 'c' already exists
mysql> exit
Writing history-file /Users/sn/.mysql_history
Bye

➜  ~ brew services restart manticoresearch-dev
Stopping `manticoresearch-dev`... (might take a while)
==> Successfully stopped `manticoresearch-dev` (label: homebrew.mxcl.manticoresearch-dev)
==> Successfully started `manticoresearch-dev` (label: homebrew.mxcl.manticoresearch-dev)

➜  ~ mysql -P9306 -h0 -v
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 6.2.13 6d36c68fb@240102 dev (columnar 2.2.5 1d1e432@231204) (secondary 2.2.5 1d1e432@231204) (knn 2.2.5 1d1e432@231204) git branch master...origin/master

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /Users/sn/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create cluster c;
--------------
create cluster c
--------------

ERROR 1064 (42000): failed to create desc: cluster 'c' already exists

mysql> show status like '%uptime%';
--------------
show status like '%uptime%'
--------------

+---------+-------+
| Counter | Value |
+---------+-------+
| uptime  | 70    |
+---------+-------+
1 row in set (0.01 sec)

sanikolaev commented 8 months ago

As discussed, Alexey will try to reproduce it again.

klirichek commented 8 months ago

Updated. Reason was starting daemon second time without galera available. With galera simple (empty) cluster survives restart. However, cluster with nodes doesn't survive. May be it requires kind of manual massage and/or tons of startup options, however the fact that on 'simple restart' it disappears look not very friendly.

sanikolaev commented 8 months ago

As discussed, what we can improve within this task is the reaction of the daemon to restarting it with an absent galera library by backing up the manticore.json etc. As discussed, this is an easy task which shouldn't take long.

manticoresoftware / manticoresearch

Empty cluster got lost on restart #1697