pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 468 forks source link

out of memory when indexing. #1174

Closed dvodvo closed 3 years ago

dvodvo commented 4 years ago

A cron job indexes the various classes of an application. The following issue is now occuring rather frequently

using config file '/home/deploy/saimtasks/shared/config/development.sphinx.conf'...
indexing index 'articolo_core'...
collected 13765 docs, 0.5 MB
sorted 0.1 Mhits, 100.0% done
FATAL: out of memory (unable to allocate 109051904 bytes)

However when I manually call the command, it consistently runs smoothly:

using config file '/home/deploy/saimtasks/shared/config/development.sphinx.conf'...
indexing index 'articolo_core'...
collected 13765 docs, 0.5 MB
sorted 0.1 Mhits, 100.0% done
total 13765 docs, 539479 bytes
total 0.251 sec, 2141614 bytes/sec, 54644.04 docs/sec

Notice the allocation figure. It is immense and so disproportionate to the actual sie of the content being indexed.

Any idea what may be at work here?

pat commented 4 years ago

Hmm… this is an odd one. I wonder if different memory limits are applied via cron vs a logged in shell session? I'm not a sysadmin, so that's a bit beyond what I can make solid recommendations on.

And with regards to how much memory being used - there's going to be more memory involved in querying and translating the data compared to what's actually stored in the indices, but you're right, it's still enormously disproportionate.

I'm not sure if it's related, but which version of Sphinx are you using?

dvodvo commented 4 years ago

It's been installe dfor a while now: Sphinx 2.2.9-id64-release (rel22-r5006)

pat commented 4 years ago

I'm not sure if there are improvements related to memory usage, but I would definitely recommend upgrading to v2.2.11 if possible.

pat commented 4 years ago

The v3.2.1 release could also be considered, but I've found it's not great if you're using PostgreSQL as your database, especially with empty indices (and if you're using delta indices, then they're often going to be empty).

dvodvo commented 4 years ago

which leaves me in a middling position: Postgresql but not delta indicies. all the classes are populated, so the indicies cannot be empty. I will attempt first with 2.2.11, wait and observe, then the same with 3.2.1

dvodvo commented 4 years ago

Side note. potentially for updating this page (possibly to add a section 'update sphinx' as there are many pointers out there but rather misleading): https://freelancing-gods.com/thinking-sphinx/v4/installing_sphinx.html I stumbled upon another route. Since: apt-cache madison sphinxsearch only returned 2.2.9, I went the "Compiling sphinx manually route" So, say the downloaded version of interest was : sphinxsearch_2.2.11-release-1~xenial_amd64.deb Simply running: sudo dpkg -i sphinxsearch_2.2.11-release-1~xenial_amd64.deb will update the package to the new version.

Running bundle exec rake ts:rebuild --silent subsequently provides the new version (2.2.11) and runs all the indicies.

dvodvo commented 4 years ago

After a week on 2.2.11 hit the same error. Interestingly enough it reports the same size of bytes that cannot be allocated. Any thoughts before bumping up to 3.2?

pat commented 4 years ago

v3.3.1 has actually been released in the past fortnight, so that could be worth a shot too…

And thanks for the notes about the documentation - I'm in the process of updating it slightly for the impending v5 release of Thinking Sphinx, so I'll review the installation instructions.

pat commented 4 years ago

Oh, one other thing I keep meaning to mention - are you using (or have you tried using) the mem_limit option? You can set it per-environment in config/thinking_sphinx.yml.

pat commented 4 years ago

That doesn't address the fact that so much memory is being used for such a small amount of data, but still, it might help Sphinx keep itself under control.

dvodvo commented 4 years ago

will attempt this now. It was in fact set at 128M (above the 109 of the error message).

Halved that, as the indicies are only 1M each; no material difference in time taken when indexing manually. We shall see how the cron jobs fare...

pat commented 3 years ago

Closing this issue as it's been dormant for a few months - but certainly, if the problem is still occurring, please do comment and we can re-open and continue to investigate.