Closed kroky closed 7 months ago
Chances are this has been already fixed in the dev version (https://mnt.cr/dev/nightly). Can you check if you can still reproduce it in it?
I tried with latest from today:
Manticore 6.2.13 e705636a4@24030111 dev (columnar 2.2.5 a5342a1@240217) (secondary 2.2.5 a5342a1@240217) (knn 2.2.5 a5342a1@240217)
and the same issue happens - seems like infinite recursion in one of the threads.
Thank you @kroky We'll try to reproduce the issue on our end to debug and fix it.
I've taken a quick look, and it well may be an edge case caused by the schema having 1444 attributes. Although Manticore is capable of handling this many attributes, such a scenario is very uncommon and unfortunately has not been thoroughly tested.
I will also upload an sql file containing all the sql commands we execute prior to the server hanging
I can't find it in the storage:
# sudo ls /mnt/s3.manticoresearch.com/issue-1893
gdb.txt index.tar.gz schema.txt show-threads.txt
Can you please provide it? A few sample data write queries should be very helpful to reproduce the issue.
Yes, our use-case is uncommon as we aggregate data from different sources into a huge index schema.
I provided a couple of more files in the s3 storage:
Note that when I try to execute these through mysql command line client, searchd doesn't hang but all the queries are successfully executed. The only difference between our indexing and this mysql command line client execution is that we use PHP PDO library to connect to port 9306 and execute those queries via prepared statements in most cases.
could you restart daemon with --logdebug
option cause the hand stop daemon and upload searchd.log into s3?
As trying to reproduce the issue with data you provided I see the warnings in the daemon log
WARNING: rt: table tiki_main_65e705bcdee87 failed to save disk chunk l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.0: error creating 'l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.0.spidx.983.tmp': Too many open files
WARNING: rt: table tiki_main_65e705bcdee87 failed to load disk chunk after RAM save: error creating 'l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.0.spidx.983.tmp': Too many open files
WARNING: rt: table tiki_main_65e705bcdee87 failed to save disk chunk l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.1: error creating 'l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.1.spidx.983.tmp': Too many open files
WARNING: rt: table tiki_main_65e705bcdee87 failed to load disk chunk after RAM save: error creating 'l4g111/tiki_main_65e705bcdee87/tiki_main_65e705bcdee87.1.spidx.983.tmp': Too many open files
If you see the same behavior - you need to increase open file limit for user who starts daemon.
Yes, I saw the too many open files error. The limit was 1024, so I increased to 1048576. Trying again results in the same hang with increasing memory usage. I uploaded the searchd.log with debug messages.
As per https://manual.manticoresearch.com/Server_settings/Searchd#max_open_files the settings should be changed 1- for Manticore and 2- for the OS. This solved the issue I was having on one database (which is the same software but different data set than @kroky: A large database with Tiki master, soon to be v27)
I unable to reproduce the hang with sql file you provided
The only difference between our indexing and this mysql command line client execution is that we use PHP PDO library to connect to port 9306 and execute those queries via prepared statements in most cases.
Could you upload PHP script that use PDO library as you described that hung the daemon or maybe Docker file that reproduce the case in the container to make sure environment is the same?
I'll need to figure it out as the software behind this is pretty big and depends on a complex DB to generate the sql queries.
This solved the issue I was having on one database
But it doesn't solve it on a large database on Tiki 26.x
If you provide the reproducible case I could investigate issue and fix it. As for now I do not see the crash or other cause of issue you describing.
I think that could be possible if SI generation hits open FD limit as described https://github.com/manticoresoftware/manticoresearch/issues/1914 . However set open file limit to a higher value or disable SI by issue export LIB_SECONDARY=
prior to daemon start should fix the issue. However the issue still persists and I still need reproducible example to investigate it.
Some related analysis has been added here: https://github.com/manticoresoftware/manticoresearch/issues/1914
@kroky said he increased open file limit but the issues still exists - seems related issue is not cause the bug described here
@kroky, could you attempt another run, checking the open files limits via /proc/pid/limits for searchd? Please ensure that there is indeed no limit, or that it is set very high.
I am trying to prepare a docker image for you to test there.
Actually, having max_open_files = max in searchd config with sufficient system open files limit solved the issue and index is rebuilt successfully. I think we can close this one then in favor of #1914.
Describe the bug We have a specific use-case when manticore hangs during indexing of a series of documents, starts to grow its memory usage until all memory is used and is killed by the OS.
To Reproduce I will upload the schema, actual database files at time of crash, output of "show threads" and thread info from gdb to your s3 storage. I will also upload an sql file containing all the sql commands we execute prior to the server hanging.
Expected behavior Indexing should continue normally and finish.
Describe the environment:
Messages from log files: searchd.log is empty at the time server hangs. query.log:
Additional context gdb has an interesting output in one of the threads:
and the last line is repeated more than 2000 times... it seems the secondary index has a hard time writing some data to disk chunks. I also notice that we have more than a thousand .tmp files in the index disk directory. Seems like an infinite loop or recursion.