manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.98k stars 499 forks source link

Manticore Crashes with a query #2656

Closed Sembiance closed 3 days ago

Sembiance commented 5 days ago

Bug Description:

When running a specific query, searchd crashes with the following error: [Wed Oct 16 13:33:49.880 2024] [787833] watchdog: main process 787834 killed cleanly with SIGKILL, shutting down

Happens every time. Below is what's needed to reproduce the problem.

Query that causes the crash:

SELECT format FROM discmaster GROUP BY format LIMIT 50000 OPTION max_matches=10000000

searchd was started with this config:

searchd {
    auto_schema = 0
    collation_server = utf8_general_ci
    data_dir = /mnt/discmaster2/manticore
    engine = columnar
    listen = 127.0.0.1:17685:http
    log = /mnt/discmaster2/log/searchd.log
    max_packet_size = 128M
    pid_file = /mnt/discmaster2/pid/searchd.pid
    preopen_tables = 1
    read_buffer_docs = 1M
    read_buffer_hits = 1M
    threads = 60
    not_terms_only_allowed = 1
}

The database files can be downloaded from (WARNING! 363GB): https://sembiance.com/wip/manticore_data.tar

Let me know if there is any other info needed.

Manticore Search Version:

Manticore 6.3.6 593045790@24080214 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)

Operating System Version:

Ubuntu 22.04.4 LTS

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] [Changelog](https://docs.google.com/spreadsheets/d/1mz_3dRWKs86FjRF7EIZUziUDK_2Hvhd97G0pLpxo05s/edit?pli=1&gid=1102439133) updated
sanikolaev commented 5 days ago

Thanks for the report and for providing the table files. That's usually very helpful. I've started downloading it now.

tomatolog commented 5 days ago

could you provide your searchd.log there crash log stored to check the crash stack?

klirichek commented 5 days ago

Do you have enough RAM? Is any records in the log/syslog/dmesg after a query? (I see combo of 10M max_matches + 60 threads + quite big table. And 'killed cleanly by SIGKILL' means, that is not the crash, but 'cleanly' exit. Looks, like something like OOM killer worked. If so, it should leave other signs, like log records)

Sembiance commented 5 days ago

could you provide your searchd.log there crash log stored to check the crash stack?

Do you have enough RAM? Is any records in the log/syslog/dmesg after a query? (I see combo of 10M max_matches + 60 threads + quite big table. And 'killed cleanly by SIGKILL' means, that is not the crash, but 'cleanly' exit. Looks, like something like OOM killer worked. If so, it should leave other signs, like log records)

The server has 768GB of RAM and not much being used elsewhere, but yah I agree that this feels like more of a OOM killer situation rather than a crash as that line from the log was the only thing I remember seeing. Sorry if that’s what this turns out to be.

I am traveling all day today, but tomorrow after I get home I will perform the query again, this time monitoring RAM usage and I will provide any additional details I see in syslog/dmesg and the searchd log.

sanikolaev commented 4 days ago

I started Manticore with the table in --force-preread mode, and it used about 50G of RSS, mostly due to the large dictionary.

root@dev2 /home/snikolaev/issue-2656/manticore/discmaster # ls -la *.spi|awk '{sum+=$5;} END{print sum/1024/1024/1024;}'
48.3511

I ran a few queries, and there were no crashes or OOM issues (the server has 128G of RAM). It's recommended to:

Sembiance commented 3 days ago

I can confirm that this was an out of RAM situation on my side.

My deepest apologies for the invalid bug submission.