Closed jdelStrother closed 5 years ago
Second issue first: definitely sounds like something that should be fixed, probably via an environment variable. I'll look into it soon :)
As for the first: you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. However, if this is only a problem due to the index task running configure automatically, you can use INDEX_ONLY=true
. That said, would it make sense to just run all the TS tasks only on your Sphinx server?
you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. [....] would it make sense to just run all the TS tasks only on your Sphinx server?
The thing I'm trying to get away from is that we have a big monolithic Rails app with a lot of dependencies (both gems, and compiled libraries like ImageMagick). So right now our Sphinx server needs to have all those irrelevant dependencies installed just so that we can generate a sphinx config file. (Admittedly this approach I'm trying of generating the config file from a Rails server then shipping it over to the sphinx server, is also filled with drawbacks.)
Just pushed some commits to the develop
branch which add two boolean settings (which can be turned on per-environment in config/thinking_sphinx.yml
): skip_directory_creation
and skip_running_check
. This should remove the need for your monkey patches, but would appreciate the confirmation after you've tested it! :)
Yep, both work great thanks π
Excellent! And that means all the Docker stuff's working well, without your Rails app on the Sphinx server?
Yep, my docker-searchd container seems to be working fine. It's basically just using the macbre/sphinxsearch image with my config file mounted into it.
Is there a release scheduled for this feature, @pat? This looks like it could solve some of the issues that have been making me put off de-monolithifying a project I've been working on. Thanks!
There's no release just yet - there's a couple of outstanding issues I want to tackle first - but it's on my radar. With a bit of luck I'll have something out early next week π€
Awesome - that sounds great. Thanks heaps!
These settings are now part of the newly released v4.3.0 π
Awesome! Thanks, Pat!
On Sat., 18 May 2019, 12:54 Pat Allan, notifications@github.com wrote:
Closed #1131 https://github.com/pat/thinking-sphinx/issues/1131.
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pat/thinking-sphinx/issues/1131?email_source=notifications&email_token=ACHCTTIZX2SDE6UQPI6SXBTPV5VWXA5CNFSM4HE2UFV2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORQO3IBA#event-2350756868, or mute the thread https://github.com/notifications/unsubscribe-auth/ACHCTTMYWFXWHM3PITHFDFTPV5VWXANCNFSM4HE2UFVQ .
@jdelStrother / @xtrasimplicity did anyone experience significant performance issues on index creation?
We're trying the mentioned setup with @macbre's Sphinx container and the referenced Thinking Sphinx settings skip_running_check
& skip_directory_creation
but it seems that rake ts:rebuild
takes indeed ages.
Or does anyone have an idea what the reason could be or are there any tweaks / other suggestions?
Thank you in advance!
@alexanderadam, We haven't had any major performance issues, but our database is quite small and a few minutes at startup isn't a huge issue for us as searching is only a tiny part of our application's functionality.
You could try increasing the size of /dev/shm
from 64MB to something a bit higher, but I'm not sure if that will have any performance benefits for TS.
@alexanderadam Our rebuilds are pretty slow by default, on a database with something like 5 million documents. We've monkeypatched it with an alternative approach:
class ThinkingSphinx::RealTime::Populator
def populate(&block)
instrument "start_populating"
limit = ENV["RT_BATCH_LIMIT"] || nil
cnt = 0
scope.find_in_batches(batch_size: batch_size) do |instances|
break if limit && (cnt += 1) > limit.to_i
transcriber.copy(*instances)
instrument "populated", instances: instances
end
instrument "finish_populating"
end
end
When we need to rebuild a sphinx index, we'll run, eg:
bin/rake ts:rebuild INDEX_FILTER=posts_rt_core RT_BATCH_LIMIT=1000
just to get sphinx back up-and-running with a few documents, and then incrementally populate it with something like this:
Post.find_each do |post|
ThinkingSphinx::RealTime.callback_for(:post).after_save(post)
end
Hi there, I'm currently trying to get thinking-sphinx working with searchd in a docker container, though I think a lot of the same issues apply if you were running searchd on a separate server to your Rails servers. I was hoping to discuss either workarounds that people are using for these cases, or work that we could do on thinking-sphinx to improve that workflow.
There's two main pain points I've been hitting:
Config-generation seems pretty insistent on calling mkdir_p for various directories, which isn't very useful if you're trying to generate configuration for a remote machine.
It seems like we ought to be able to call
rake ts:index
from a local Rails server and have it populate our realtime indexes on a remote server. However, TS also tries to check that searchd is running (via the pid file) and tries to rotate the index after it's done populating.In my hacky experimentation I've been working around these with this rake file:
with this docker-compose:
Any thoughts/plans on separating out some of the TS code that only works if you're running Rails & Sphinx side-by-side? Or am I doing it all wrong?
(Previous docker discussions at https://github.com/pat/thinking-sphinx/issues/1010)