manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.02k stars 506 forks source link

Additional "searchd" Process Causing Issues with Indexer Signals #2454

Open ssi10 opened 3 months ago

ssi10 commented 3 months ago

Bug Description:

Hello,

I am encountering an issue with Manticoresearch where an additional searchd process is being spawned unexpectedly.

Details:

Typically, two searchd processes are maintained. However, intermittently, an additional searchd process appears. This can happen up to twice a day, or as infrequently as once every three weeks. When this additional process is spawned, it captures all signals from the indexer, preventing the delta index from refreshing. Even if a signal is sent, the newly spawned process takes it, so the updates made to the delta index are not reflected in the search results. The only workaround I have found is to forcibly terminate (kill -9) the newly spawned process. Once this is done, everything operates normally again for a while. I have not been able to determine a pattern or specific trigger for this behavior.

Steps to Reproduce:

Run Manticoresearch with typical configuration. Monitor searchd processes over an extended period. Observe that, occasionally, an additional searchd process is spawned.

Expected Behavior:

Only two searchd processes should be maintained, and no additional processes should capture the indexer signals, ensuring the delta index refreshes correctly.

Actual Behavior:

An additional searchd process is spawned intermittently, capturing all indexer signals and preventing the delta index from refreshing until the process is forcibly terminated.


When the issue of the additional process occurs, I did not find anything unusual in the logs. Manticoresearch was installed using dnf. Currently, we are operating multiple servers, and the same issue is occurring on all of them. The following is the normal situation, and if one additional process appears here, the problem occurs.

# ps aux|grep searchd
mantico+ 1419404  0.0  0.0  75752  3292 ?        S    Jul06   0:00 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf
mantico+ 1419405 5210 18.8 114096540 99544460 ?  Sl   Jul06 1714440:26 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf

Manticore Search Version:

6.3.0, 6.3.2

Operating System Version:

AlmaLinux 8, 9

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] Changelog updated
tomatolog commented 3 months ago

you need to upload your searchd.log file from the process that crashed with the crash log info to start the crash investigation

ssi10 commented 3 months ago

you need to upload your searchd.log file from the process that crashed with the crash log info to start the crash investigation

The process did not crash, so there is no relevant information in the logs.

When an additional process is spawned, no further logs are recorded in searchd.log. Once I kill the newly created process with kill -9, the logging resumes.

Usually, the logs contain the following types of information, but there are no related logs when the additional process is spawned or terminated. Delta updates are performed every minute.

[Sun Jul 28 22:10:02.440 2024] [1419405] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sun Jul 28 22:10:04.544 2024] [1419490] rotating table 'scrap_delta': started
[Sun Jul 28 22:10:04.548 2024] [1419490] RW-idx for rename to .old, acquiring...
[Sun Jul 28 22:10:04.548 2024] [1419490] RW-idx for rename to .old, acquired...
[Sun Jul 28 22:10:04.551 2024] [1419490] rotating table 'scrap_delta': success
[Sun Jul 28 22:10:04.551 2024] [1419490] rotating table: all tables done
[Sun Jul 28 22:10:04.852 2024] [1419405] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sun Jul 28 22:10:07.115 2024] [1419599] rotating table 'message_delta': started
[Sun Jul 28 22:10:07.122 2024] [1419599] RW-idx for rename to .old, acquiring...
[Sun Jul 28 22:10:07.122 2024] [1419599] RW-idx for rename to .old, acquired...
[Sun Jul 28 22:10:07.128 2024] [1419599] rotating table 'message_delta': success
[Sun Jul 28 22:10:07.128 2024] [1419599] rotating table 'document_modified_delta': started
[Sun Jul 28 22:10:07.134 2024] [1419599] RW-idx for rename to .old, acquiring...
[Sun Jul 28 22:10:07.134 2024] [1419599] RW-idx for rename to .old, acquired...
[Sun Jul 28 22:10:07.135 2024] [1419599] rotating table 'document_modified_delta': success
[Sun Jul 28 22:10:07.135 2024] [1419599] rotating table: all tables done
[Sun Jul 28 22:10:07.489 2024] [1419405] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sun Jul 28 22:10:09.386 2024] [1419439] nothing to rotate after SIGHUP
[Sun Jul 28 22:10:22.722 2024] [1419405] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sun Jul 28 22:10:24.811 2024] [1419548] rotating table 'document_delta': started
[Sun Jul 28 22:10:24.827 2024] [1419548] RW-idx for rename to .old, acquiring...
[Sun Jul 28 22:10:24.827 2024] [1419548] RW-idx for rename to .old, acquired...
[Sun Jul 28 22:10:24.827 2024] [1419548] rotating table 'document_delta': success
[Sun Jul 28 22:10:24.827 2024] [1419548] rotating table: all tables done
[Sun Jul 28 22:10:34.022 2024] [1419405] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sun Jul 28 22:10:36.214 2024] [1419561] rotating table 'comment_delta': started
[Sun Jul 28 22:10:36.230 2024] [1419561] RW-idx for rename to .old, acquiring...
[Sun Jul 28 22:10:36.230 2024] [1419561] RW-idx for rename to .old, acquired...
[Sun Jul 28 22:10:36.230 2024] [1419561] rotating table 'comment_delta': success
[Sun Jul 28 22:10:36.230 2024] [1419561] rotating table: all tables done

Here are the consecutive log entries.

For reference, a new process was spawned at 11:14 When the new process is spawned, the number of searchd processes visible in ps command increases from the usual 2 to 3. , and I terminated it at 16:01:35.

[Sat Jul 27 11:12:04.439 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:12:06.183 2024] [8148] rotating table 'message_delta': started
[Sat Jul 27 11:12:06.189 2024] [8148] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:12:06.189 2024] [8148] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:12:06.193 2024] [8148] rotating table 'message_delta': success
[Sat Jul 27 11:12:06.193 2024] [8148] rotating table 'scrap_delta': started
[Sat Jul 27 11:12:06.199 2024] [8148] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:12:06.199 2024] [8148] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:12:06.201 2024] [8148] rotating table 'scrap_delta': success
[Sat Jul 27 11:12:06.201 2024] [8148] rotating table: all tables done
[Sat Jul 27 11:12:06.473 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:12:08.046 2024] [8191] rotating table 'document_modified_delta': started
[Sat Jul 27 11:12:08.050 2024] [8191] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:12:08.050 2024] [8191] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:12:08.050 2024] [8191] rotating table 'document_modified_delta': success
[Sat Jul 27 11:12:08.050 2024] [8191] rotating table: all tables done
[Sat Jul 27 11:12:08.419 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:12:09.919 2024] [8214] nothing to rotate after SIGHUP
[Sat Jul 27 11:12:18.573 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:12:19.780 2024] [8138] rotating table 'document_delta': started
[Sat Jul 27 11:12:19.786 2024] [8138] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:12:19.786 2024] [8138] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:12:19.786 2024] [8138] rotating table 'document_delta': success
[Sat Jul 27 11:12:19.786 2024] [8138] rotating table: all tables done
[Sat Jul 27 11:12:29.257 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:12:30.590 2024] [8217] rotating table 'comment_delta': started
[Sat Jul 27 11:12:30.597 2024] [8217] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:12:30.597 2024] [8217] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:12:30.633 2024] [8217] rotating table 'comment_delta': success
[Sat Jul 27 11:12:30.633 2024] [8217] rotating table: all tables done
[Sat Jul 27 11:13:03.415 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:13:04.674 2024] [8277] rotating table 'message_delta': started
[Sat Jul 27 11:13:04.691 2024] [8277] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:13:04.691 2024] [8277] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:13:04.695 2024] [8277] rotating table 'message_delta': success
[Sat Jul 27 11:13:04.695 2024] [8277] rotating table 'scrap_delta': started
[Sat Jul 27 11:13:04.711 2024] [8277] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:13:04.711 2024] [8277] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:13:04.713 2024] [8277] rotating table 'scrap_delta': success
[Sat Jul 27 11:13:04.713 2024] [8277] rotating table: all tables done
[Sat Jul 27 11:13:05.174 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:13:06.680 2024] [8300] rotating table 'document_modified_delta': started
[Sat Jul 27 11:13:06.683 2024] [8300] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:13:06.683 2024] [8300] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:13:06.683 2024] [8300] rotating table 'document_modified_delta': success
[Sat Jul 27 11:13:06.683 2024] [8300] rotating table: all tables done
[Sat Jul 27 11:13:07.181 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:13:08.553 2024] [8268] nothing to rotate after SIGHUP
[Sat Jul 27 11:13:17.760 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:13:19.197 2024] [8284] rotating table 'document_delta': started
[Sat Jul 27 11:13:19.203 2024] [8284] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:13:19.203 2024] [8284] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:13:19.204 2024] [8284] rotating table 'document_delta': success
[Sat Jul 27 11:13:19.204 2024] [8284] rotating table: all tables done
[Sat Jul 27 11:13:27.123 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 11:13:28.445 2024] [8268] rotating table 'comment_delta': started
[Sat Jul 27 11:13:28.451 2024] [8268] RW-idx for rename to .old, acquiring...
[Sat Jul 27 11:13:28.451 2024] [8268] RW-idx for rename to .old, acquired...
[Sat Jul 27 11:13:28.451 2024] [8268] rotating table 'comment_delta': success
[Sat Jul 27 11:13:28.451 2024] [8268] rotating table: all tables done
[Sat Jul 27 11:14:03.832 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:01:35.778 2024] [8275] Config changed (read 0 chars)
[Sat Jul 27 16:01:35.787 2024] [8317] rotating table 'message_delta': started
[Sat Jul 27 16:01:35.788 2024] [8317] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:01:35.788 2024] [8317] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:01:35.790 2024] [8317] rotating table 'message_delta': success
[Sat Jul 27 16:01:35.790 2024] [8317] rotating table 'scrap_delta': started
[Sat Jul 27 16:01:35.791 2024] [8317] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:01:35.791 2024] [8317] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:01:35.792 2024] [8317] rotating table 'scrap_delta': success
[Sat Jul 27 16:01:35.792 2024] [8317] rotating table 'comment_delta': started
[Sat Jul 27 16:01:35.801 2024] [8317] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:01:35.801 2024] [8317] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:01:35.802 2024] [8317] rotating table 'comment_delta': success
[Sat Jul 27 16:01:35.802 2024] [8317] rotating table 'document_modified_delta': started
[Sat Jul 27 16:01:35.803 2024] [8317] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:01:35.803 2024] [8317] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:01:35.803 2024] [8317] rotating table 'document_modified_delta': success
[Sat Jul 27 16:01:35.803 2024] [8317] rotating table 'document_delta': started
[Sat Jul 27 16:01:35.808 2024] [8317] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:01:35.808 2024] [8317] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:01:35.808 2024] [8317] rotating table 'document_delta': success
[Sat Jul 27 16:01:35.808 2024] [8317] rotating table: all tables done
[Sat Jul 27 16:01:36.288 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:01:37.483 2024] [8249] nothing to rotate after SIGHUP
[Sat Jul 27 16:02:03.248 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:02:04.569 2024] [8274] rotating table 'message_delta': started
[Sat Jul 27 16:02:04.574 2024] [8274] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:02:04.574 2024] [8274] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:02:04.578 2024] [8274] rotating table 'message_delta': success
[Sat Jul 27 16:02:04.578 2024] [8274] rotating table 'scrap_delta': started
[Sat Jul 27 16:02:04.582 2024] [8274] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:02:04.582 2024] [8274] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:02:04.585 2024] [8274] rotating table 'scrap_delta': success
[Sat Jul 27 16:02:04.585 2024] [8274] rotating table: all tables done
[Sat Jul 27 16:02:05.069 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:02:06.462 2024] [8135] rotating table 'document_modified_delta': started
[Sat Jul 27 16:02:06.463 2024] [8135] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:02:06.463 2024] [8135] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:02:06.464 2024] [8135] rotating table 'document_modified_delta': success
[Sat Jul 27 16:02:06.464 2024] [8135] rotating table: all tables done
[Sat Jul 27 16:02:06.962 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:02:08.472 2024] [8240] nothing to rotate after SIGHUP
[Sat Jul 27 16:02:17.335 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:02:18.915 2024] [8260] rotating table 'document_delta': started
[Sat Jul 27 16:02:18.921 2024] [8260] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:02:18.921 2024] [8260] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:02:18.922 2024] [8260] rotating table 'document_delta': success
[Sat Jul 27 16:02:18.922 2024] [8260] rotating table: all tables done
[Sat Jul 27 16:02:26.529 2024] [8124] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Sat Jul 27 16:02:27.753 2024] [8282] rotating table 'comment_delta': started
[Sat Jul 27 16:02:27.762 2024] [8282] RW-idx for rename to .old, acquiring...
[Sat Jul 27 16:02:27.762 2024] [8282] RW-idx for rename to .old, acquired...
[Sat Jul 27 16:02:27.791 2024] [8282] rotating table 'comment_delta': success
[Sat Jul 27 16:02:27.791 2024] [8282] rotating table: all tables done
sanikolaev commented 3 months ago

the number of searchd processes visible in ps command increases from the usual 2 to 3

What's the parent pid of the additional searchd process?

What's your configuration file? Isn't it a script ?

ssi10 commented 3 months ago

What's the parent pid of the additional searchd process?

I have set up logging to capture the issue next time it occurs. I will comment again once we have the logs.

What's your configuration file? Isn't it a script?

I am using a PHP script.

sanikolaev commented 3 months ago

I am using a PHP script.

Can it be then that smth's wrong with it and when the searchd tries to reload the config, the script hangs or smth?

[Sat Jul 27 16:01:35.778 2024] [8275] Config changed (read 0 chars)

looks suspicious.

ssi10 commented 3 months ago

Can it be then that smth's wrong with it and when the searchd tries to reload the config, the script hangs or smth?

There doesn't seem to be any part of the script that would cause it to hang, as it doesn't involve any external communications. However, since we haven't been capturing PHP errors, I have registered an error handler to log any errors that occur within the script. If the problem arises again, we will check to see if any PHP errors are logged.

sanikolaev commented 3 months ago

I have registered an error handler to log any errors that occur within the script

I'd also log its execution time.

ssi10 commented 2 months ago

What's the parent pid of the additional searchd process?

I am leaving an additional record because the issue of extra processes occurring had not happened for a while but reappeared today.

During this time, nothing was logged in the PHP error handler, and there were no PHP processes that ran for more than one second.

When the additional process was generated, the pstree results were as follows.

The process IDs that were already running were 522169(searchd) and 522170(searchd), and the process that was newly created and needed to be terminated was 1130900. It appears that the parent of process 1130900(searchd) is 522170.

searchd(522169)---searchd(522170)-+-manticore-execu(522375)-+-manticore-execu(522381)
                                  |                         |-manticore-execu(522382)
                                  |                         |-manticore-execu(522383)
                                  |                         |-manticore-execu(522384)
                                                            ~~~
                                  |                         |-manticore-execu(522428)
                                  |                         |-manticore-execu(522429)
                                  |                         |-manticore-execu(522430)
                                  |                         `-manticore-execu(522431)
                                  |-work_156(1130900)
                                  |-{searchd}(522171)
                                  |-{searchd}(522172)
                                  |-{searchd}(522173)
                                  |-{searchd}(522174)
                                  ~~~
                                  |-{searchd}(522369)
                                  |-{searchd}(522370)
                                  |-{searchd}(522371)
                                  `-{searchd}(522374)
sanikolaev commented 2 months ago

I could reproduce Config changed (read 0 chars) by emptying the script config, but it didn't cause any issues. I would recommend making the config static (e.g. by compiling it from your script before running indexer) to localize the issue. If it helps we'll at least know the root cause is related with scripted configuration.