manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.87k stars 490 forks source link

RT index: searchd "stuck" forever when many documents are being inserted and RAMchunk gets full #1026

Closed Lot-Art closed 1 year ago

Lot-Art commented 1 year ago

Describe the bug

RT index: searchd "stuck" forever when many documents are being inserted and RAMchunk gets full

To Reproduce

Install as in the docs

.conf: keep default, except listen on 0.0.0.0 instead of 127.0.0.1

common {  
    plugin_dir = /usr/local/lib/manticore  
}  

searchd {  
    listen = 0.0.0.0:9312  
    listen = 0.0.0.0:9306:mysql  
    listen = 0.0.0.0:9308:http  
    log = /var/log/manticore/searchd.log  
    query_log = /var/log/manticore/query.log  
    pid_file = /var/run/manticore/searchd.pid  
    data_dir = /var/lib/manticore  
    query_log_format = sphinxql  
}  

Start manticore

$ sudo systemctl start manticore  

Create 2 RT indexes

$ mysql -h 0 -P 9306 < ./manticore_create_indexes.sql  

(where manticore_create_indexes.sql is the following)

CREATE TABLE la8_items_all ( title text indexed, description text indexed, match_str string, type int, country_code string, catalog_id int, house_match_str string, scrapers multi, section int, status int, low_estimate_usd int, high_estimate_usd int, sold_hammer_usd int, value_usd int, value_low_usd int, value_high_usd int, deal_pcnt bigint, list_time int, pushout_time int, has_img int, featured_until int, struct_category string, struct_sub_category string, struct_name_brand string, struct_model string, struct_is_attributed int, struct_is_multiple int, struct_is_copy int, struct_label1 string, struct_label2 string, struct_label3 string, create_time int);  

CREATE TABLE la8_items_upcaft ( title text indexed, description text indexed, match_str string, type int, country_code string, catalog_id int, house_match_str string, scrapers multi, section int, status int, low_estimate_usd int, high_estimate_usd int, sold_hammer_usd int, value_usd int, value_low_usd int, value_high_usd int, deal_pcnt bigint, list_time int, pushout_time int, has_img int, featured_until int, struct_category string, struct_sub_category string, struct_name_brand string, struct_model string, struct_is_attributed int, struct_is_multiple int, struct_is_copy int, struct_label1 string, struct_label2 string, struct_label3 string, create_time int);  

The resulting directories are the following

$ sudo tree -pufi /var/lib/manticore  
/var/lib/manticore  
[drwx------ manticore]  /var/lib/manticore/binlog  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.001  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.lock  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.meta  
[drwxr-xr-x manticore]  /var/lib/manticore/data  
[drwx------ manticore]  /var/lib/manticore/la8_items_all  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.lock  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.meta  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.ram  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.settings  
[drwx------ manticore]  /var/lib/manticore/la8_items_upcaft  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.lock  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.meta  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.ram  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.settings  
[-rw------- manticore]  /var/lib/manticore/manticore.json  
[-rw------- manticore]  /var/lib/manticore/state.sql  

4 directories, 13 files  

We will fill the data with these 2 bash scripts
Here is a batch script that simulates my app. My php is complex, but it must insert documents one-by-one, to about 30 million documents, and it runs on 30 processes simultaneously (30 separate php cli processes running on 1 server).

Start filling the indexes with data (5 is the number of simultaneous processes that will be connecting to manticore, each insert data)

$ ./fill_manti_starter.sh 5  

Wait till searchd gets stuck
Fairly quickly we will see (in htop) that the script that should be inserting the data will eventually have CPU activity at 0% . Also the searchd server will have its CPUs at 0% .
Nothing errors, nothing responds, but everthing is running.

In the searchd.log and the directories

$ sudo cat /var/log/manticore/searchd.log  
[Tue Feb  7 13:54:09.914 2023] [743] watchdog: main process 744 forked ok  
[Tue Feb  7 13:54:09.921 2023] [744] starting daemon version '5.0.2 348514c86@220530 dev' ...  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for sphinx and http(s), port=9312  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for mysql, port=9306  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for sphinx and http(s), port=9308  
[Tue Feb  7 13:54:09.940 2023] [750] binlog: replaying log /var/lib/manticore/binlog/binlog.001  
[Tue Feb  7 13:54:09.941 2023] [750] WARNING: binlog: replay error at pos=44  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: index la8_items_all: skipped at tid 0 and max binlog tid 0  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 1 indexes  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: finished replaying /var/lib/manticore/binlog/binlog.001; 63.4 MB in 0.000 sec  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: finished replaying total 1 in 0.001 sec  
[Tue Feb  7 13:54:09.942 2023] [744] accepting connections  
[Tue Feb  7 13:54:09.942 2023] [746] prereading 0 indexes  
[Tue Feb  7 13:54:09.942 2023] [746] prereaded 0 indexes in 0.000 sec  
[Tue Feb  7 13:54:58.988 2023] [746] rt: index la8_items_all: diskchunk 0(1), segments 32  saved in 1.101265 (1.102532) sec, RAM saved/new 44715847/58624 ratio 0.950000 (soft limit 127506841, conf limit 134217728)  
[Tue Feb  7 13:55:46.973 2023] [748] rt: index la8_items_all: diskchunk 1(2), segments 32  saved in 3.196693 (3.242457) sec, RAM saved/new 127433970/60456 ratio 0.950000 (soft limit 127506841, conf limit 134217728)  

Those last 2 lines seems to me that it tries to write the things on the disk (and seems successful). The Directories now look like this:

$ sudo tree -pufi /var/lib/manticore  
/var/lib/manticore  
[drwx------ manticore]  /var/lib/manticore/binlog  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.003  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.004  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.lock  
[-rw------- manticore]  /var/lib/manticore/binlog/binlog.meta  
[drwxr-xr-x manticore]  /var/lib/manticore/data  
[drwx------ manticore]  /var/lib/manticore/la8_items_all  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spa  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spb  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spd  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spe  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.sph  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.sphi  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spi  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spm  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spp  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.0.spt  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spa  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spb  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spd  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spe  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.sph  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.sphi  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spi  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spm  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spp  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.1.spt  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.lock  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.meta  
[-rw------- manticore]  /var/lib/manticore/la8_items_all/la8_items_all.settings  
[drwx------ manticore]  /var/lib/manticore/la8_items_upcaft  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.lock  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.meta  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.ram  
[-rw------- manticore]  /var/lib/manticore/la8_items_upcaft/la8_items_upcaft.settings  
[-rw------- manticore]  /var/lib/manticore/manticore.json  
[-rw------- manticore]  /var/lib/manticore/state.sql  

4 directories, 33 files  

Expected behavior

Continue adding the documents OR throw an error OR quit OR drop connections.

Describe the environment

Messages from log files

In the searchd.log and the directories

$ sudo cat /var/log/manticore/searchd.log  
[Tue Feb  7 13:54:09.914 2023] [743] watchdog: main process 744 forked ok  
[Tue Feb  7 13:54:09.921 2023] [744] starting daemon version '5.0.2 348514c86@220530 dev' ...  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for sphinx and http(s), port=9312  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for mysql, port=9306  
[Tue Feb  7 13:54:09.921 2023] [744] listening on all interfaces for sphinx and http(s), port=9308  
[Tue Feb  7 13:54:09.940 2023] [750] binlog: replaying log /var/lib/manticore/binlog/binlog.001  
[Tue Feb  7 13:54:09.941 2023] [750] WARNING: binlog: replay error at pos=44  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: index la8_items_all: skipped at tid 0 and max binlog tid 0  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 1 indexes  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: finished replaying /var/lib/manticore/binlog/binlog.001; 63.4 MB in 0.000 sec  
[Tue Feb  7 13:54:09.941 2023] [750] binlog: finished replaying total 1 in 0.001 sec  
[Tue Feb  7 13:54:09.942 2023] [744] accepting connections  
[Tue Feb  7 13:54:09.942 2023] [746] prereading 0 indexes  
[Tue Feb  7 13:54:09.942 2023] [746] prereaded 0 indexes in 0.000 sec  
[Tue Feb  7 13:54:58.988 2023] [746] rt: index la8_items_all: diskchunk 0(1), segments 32  saved in 1.101265 (1.102532) sec, RAM saved/new 44715847/58624 ratio 0.950000 (soft limit 127506841, conf limit 134217728)  
[Tue Feb  7 13:55:46.973 2023] [748] rt: index la8_items_all: diskchunk 1(2), segments 32  saved in 3.196693 (3.242457) sec, RAM saved/new 127433970/60456 ratio 0.950000 (soft limit 127506841, conf limit 134217728)  

Additional context

none

tomatolog commented 1 year ago

could you try package from the dev repository?

Lot-Art commented 1 year ago

could you try package from the dev repository?

I think i did get the dev (by simply following the official install doc today). My Manticore version is Manticore 5.0.2 348514c86@220530 dev

Lot-Art commented 1 year ago

could you try package from the dev repository?

Do you mean to build bleading edge 6.0.0.0 from source? this?

tomatolog commented 1 year ago

here is a doc description of the dev package https://manual.manticoresearch.com/Installation/Debian_and_Ubuntu#Development-packages

Lot-Art commented 1 year ago

Great, it works! Thank you!

I got 5.0.3 from the dev package (i expected 6.0). As far as my 2 cents go: 5.0.3 could be placed into the official stable release package.

tomatolog commented 1 year ago

we pushed release 6.0 as version 5.0.3 from dev has a lot of changes to be just 5.X.X release