manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.05k stars 508 forks source link

GitHub#72 ⁃ mem_limit not being honoured #72

Open akissa opened 6 years ago

akissa commented 6 years ago

Describe the environment

2.6.2:

Centos 6:

Packaged from source:

Describe the problem

mem_limit is set to 256M but the indexer process is using over 3GB of RAM

Steps to reproduce:

searchd
{
    listen = /var/run/manticore/manticore.sock:sphinx
    listen = /var/run/manticore/mysql41.sock:mysql41
    log = /var/log/manticore/searchd.log
    query_log  = /var/log/manticore/query.log
    read_timeout = 5
    max_children = 30
    pid_file     = /var/run/manticore/searchd.pid
    seamless_rotate = 1
    preopen_indexes = 0
    unlink_old = 1
    workers = threads
    collation_server = utf8_general_ci
    rt_flush_period = 3600
    binlog_path = /var/lib/manticore
}

indexer
{
        mem_limit = 256M
        max_iops = 30
        max_iosize = 4M
}

common
{
    lemmatizer_base = /usr/share/manticore/dicts
}

source base
{
    type = odbc
    odbc_dsn = DSN=postgres-database-ssl
}

source messages : base
{
    sql_query_range = SELECT MIN(id), MAX(id) FROM messages WHERE ts < (SELECT maxts FROM indexer_counters WHERE tablename='messages' AND hostname='search.example.com')
    sql_range_step  = 10000
    sql_query_pre = INSERT INTO indexer_counters (tablename, hostname, maxts) \
                VALUES('messages', 'search.example.com', NOW()) \
                ON CONFLICT (tablename,hostname) DO UPDATE SET maxts=excluded.maxts
    sql_query = SELECT id, messageid, subject, CRC32(from_address) AS from_addr, CRC32(to_address) AS \
                to_addr, CRC32(from_domain) AS from_dom, CRC32(to_domain) AS to_dom, headers, \
                hostname, UNIX_TIMESTAMP(timestamp) AS timestamp, isquarantined FROM messages \
                WHERE ts < (SELECT maxts FROM indexer_counters WHERE tablename='messages' AND hostname='search.example.com') \
                AND id >= $start AND id <= $end
    sql_query_post = INSERT INTO indexer_counters (tablename, hostname, maxts) \
                VALUES('messages_tmp', 'search.example.com', \
                (SELECT maxts FROM indexer_counters WHERE tablename='messages' AND hostname='search.example.com')) \
                ON CONFLICT (tablename,hostname) DO UPDATE SET maxts=excluded.maxts
    sql_query_post_index = DELETE FROM indexer_counters WHERE tablename='messages' AND hostname='search.example.com'
    sql_query_post_index = UPDATE indexer_counters SET tablename='messages' WHERE tablename='messages_tmp' AND hostname='search.example.com'
    sql_query_post_index = DELETE FROM indexer_killlist WHERE ts < (SELECT maxts FROM indexer_counters WHERE tablename='messages' \
                AND hostname='search.example.com') \
                AND tablename='messages'
    sql_column_buffers = headers=64K
    sql_attr_uint = from_addr
    sql_attr_uint = to_addr
    sql_attr_uint = from_dom
    sql_attr_uint = to_dom
    sql_attr_timestamp = timestamp
    sql_attr_bool = isquarantined
}

index messages
{
        source = messages
        path = /var/lib/manticore/messages
        docinfo = extern
        morphology = stem_en
        min_infix_len = 3
        index_exact_words = 1
        ondisk_attrs = 1
}

When indexing top shows the following

top

top - 10:06:00 up 23 days, 16:42,  1 user,  load average: 2.24, 1.97, 2.35
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.2%us,  1.0%sy,  0.0%ni, 82.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8193520k total,  7443064k used,   750456k free,    13480k buffers
Swap:  8208380k total,   770004k used,  7438376k free,  1952980k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                         
19800 manticor  20   0 3594m 3.1g 3412 R 16.0 39.8   2:30.06 indexer

As you can see resident memory usage is 3.1G.

There are about 75k records in the table.

Messsages from log files:

No errors.

akissa commented 6 years ago

I can confirm that the same issue is evident when using manticore-2.6.3-180328-cccb538-release-stemmer-rhel6-bin.rpm

It seems to be caused by infix indexing, min_infix_len when disabled the memory and cpu usage does not sky rocket.

tomatolog commented 6 years ago

infix builder that got enabled with min_infix_len option does not constrain memory limit option.

Could you provide result size of index files? (ls -lh /var/lib/manticore/messages*)

akissa commented 6 years ago

I have disabled infix indexing on the instance with 75k records and reindexed. The data below is from an instance with 46k records.

ls -lh /var/lib/manticore/messages*
-rw-r--r-- 1 manticore manticore 1.6M Apr 10 09:18 /var/lib/manticore/messages.spa
-rw-r--r-- 1 manticore manticore  32M Apr 10 09:18 /var/lib/manticore/messages.spd
-rw-r--r-- 1 manticore manticore 187K Apr 10 09:18 /var/lib/manticore/messages.spe
-rw-r--r-- 1 manticore manticore  561 Apr 10 09:18 /var/lib/manticore/messages.sph
-rw-r--r-- 1 manticore manticore  16M Apr 10 09:18 /var/lib/manticore/messages.spi
-rw-r--r-- 1 manticore manticore    0 Apr 10 09:18 /var/lib/manticore/messages.spk
-rw------- 1 manticore manticore    0 Apr 10 09:18 /var/lib/manticore/messages.spl
-rw-r--r-- 1 manticore manticore    0 Apr 10 09:18 /var/lib/manticore/messages.spm
-rw-r--r-- 1 manticore manticore  25M Apr 10 09:18 /var/lib/manticore/messages.spp
-rw-r--r-- 1 manticore manticore    1 Apr 10 09:18 /var/lib/manticore/messages.sps
akissa commented 6 years ago

Sorry the above had infix indexing disabled. Below is same instance 46k records with infix enabled.

ls -lh /var/lib/manticore/messages*
-rw-r--r-- 1 manticore manticore 1.6M Apr 10 09:32 /var/lib/manticore/messages.spa
-rw-r--r-- 1 manticore manticore  63M Apr 10 09:34 /var/lib/manticore/messages.spd
-rw-r--r-- 1 manticore manticore 375K Apr 10 09:34 /var/lib/manticore/messages.spe
-rw-r--r-- 1 manticore manticore  561 Apr 10 09:34 /var/lib/manticore/messages.sph
-rw-r--r-- 1 manticore manticore 155M Apr 10 09:34 /var/lib/manticore/messages.spi
-rw-r--r-- 1 manticore manticore    0 Apr 10 09:32 /var/lib/manticore/messages.spk
-rw------- 1 manticore manticore    0 Apr 10 09:34 /var/lib/manticore/messages.spl
-rw-r--r-- 1 manticore manticore    0 Apr 10 09:32 /var/lib/manticore/messages.spm
-rw-r--r-- 1 manticore manticore  49M Apr 10 09:34 /var/lib/manticore/messages.spp
-rw-r--r-- 1 manticore manticore    1 Apr 10 09:34 /var/lib/manticore/messages.sps
airolg commented 6 years ago

Added to backlog, thank you

githubmanticore commented 5 years ago

➤ Aleksey N. Vinogradov commented:

This thing first need to be discussed. We actually have several places in index building where mem limit is not honored. Moreover, it is honored only manually, on the places where we ourselves calculate the consumption, and all other things underneath may be still greedy.

Reworking is possible, but due to different arch inside it is not a 'bug' but quite essential refactor. In spite of coming v.3 it is not clear whether v.2 worth it or not (since AFAIK build index procedure is quite reworked in v.3 and may be already solved the issue). Just want to avoid time-wasting on the things which possible will be deprecated and leaved in months. Don't know about real plans for v.3 releasing and details of indexing, m.b. @glook may tell more comprehensive story about last.

As a side note: we may try to make a kind of 'high-tech' cheat by making tiny trampoline script/app which will create mem-limited cgroup and put indexer worker there.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Feel free to re-open the issue in case it becomes actual.

manticoresearch commented 4 years ago

Still actual

jiru commented 3 years ago

Just adding my findings to the discussion. I hit a FATAL: out of memory error with the indexer despite having 100MB of available RAM and mem_limit set to 64MB. I am using plain indexes. I discovered that the excessive RAM usage was related to the fact that I was indexing multiple indexes in a single command.

With mem_limit set to 64MB, I got:

indexer --sighup-each --rotate cmn_main_index deu_main_index eng_main_index fra_main_index gos_main_index heb_main_index jpn_main_index lez_main_index lfn_main_index oci_main_index ood_main_index spa_main_index vie_main_index
Manticore 3.3.0 01fc8ad1@200204 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2020, Manticore Software LTD (http://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
indexing index 'cmn_main_index'...
collected 3 docs, 0.0 MB
creating lookup: 0.0 Kdocs, 100.0% done
creating histograms: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
FATAL: out of memory (unable to allocate 54525952 bytes)

Then I tried to lower mem_limit to 32MB and I got:

indexer --sighup-each --rotate cmn_main_index deu_main_index eng_main_index fra_main_index gos_main_index heb_main_index jpn_main_index lez_main_index lfn_main_index oci_main_index ood_main_index spa_main_index vie_main_index
Manticore 3.3.0 01fc8ad1@200204 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2020, Manticore Software LTD (http://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
indexing index 'cmn_main_index'...
collected 3 docs, 0.0 MB
creating lookup: 0.0 Kdocs, 100.0% done
creating histograms: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 3 docs, 168 bytes
total 0.300 sec, 559 bytes/sec, 9.98 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=27285).
indexing index 'deu_main_index'...
collected 1 docs, 0.0 MB
creating lookup: 0.0 Kdocs, 100.0% done
creating histograms: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 1 docs, 15 bytes
total 0.367 sec, 40 bytes/sec, 2.72 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=27285).
indexing index 'eng_main_index'...
collected 11 docs, 0.0 MB
creating lookup: 0.0 Kdocs, 100.0% done
creating histograms: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
FATAL: out of memory (unable to allocate 27262976 bytes)

Eventually, I tried to index each index in a separate command and it worked:

for index in cmn_main_index deu_main_index eng_main_index fra_main_index \
    gos_main_index heb_main_index jpn_main_index lez_main_index lfn_main_index \
    oci_main_index ood_main_index spa_main_index vie_main_index; do
  indexer --sighup-each --rotate "$index" || break
done
tomatolog commented 3 years ago

could you provide your config and source index data that reproduces this case?

jiru commented 3 years ago

Here you are: indexer_mem.zip.