Closed raviada closed 8 years ago
I'm afraid I've not used Docker at all, so I'm not much help. This has been discussed a bit though in issue #975 - so that's probably a good place to start?
Hi @raviada - I'm still looking for a proper solution to this for production / scalable environment. So far I was only able to run sphinx inside the same container as our rails app, and this works. But I don't think it's production-ready, because in production, you'd ideally have more than one container with rails, so you end up with multiple instances of sphinx, which can easily go out-of-sync with each other.
Here's our Dockerfile for development
FROM ruby:2.3
RUN apt-get update -qq && apt-get install -y build-essential cmake libpq-dev imagemagick qt5-default libqt5webkit5-dev libmysqlclient-dev libodbc1
ENV RAILS_ENV development
ENV app /app
RUN mkdir $app
WORKDIR $app
RUN wget http://sphinxsearch.com/files/sphinx-2.2.9-release.tar.gz
RUN tar -zxvf sphinx-2.2.9-release.tar.gz
RUN cd sphinx-2.2.9-release && ./configure --with-pgsql --with-mysql && make && make install
ADD Gemfile $app/Gemfile
ADD Gemfile.lock $app/Gemfile.lock
ADD config/database.yml.docker-template $app/config/database.yml
RUN bundle install
ADD . $app
CMD rails s -b 0.0.0.0
@mateusz-useo might have been more successful than I was... I'm pretty sure it's do-able, but far from trivial unfortunately. Sadly it might be easier to switch to something like ElasticSearch than wrestle getting sphinx/TS in a dockerized environment.
So, my understanding of Docker has increased slightly since this issue was first logged - though it's still minimal. But what I'd recommend is that the Sphinx container has a copy of the Rails app in it, so you can issue the TS rake tasks to it, but it doesn't have a web server running and is only for Sphinx.
Thus, you'll have a single container for Sphinx, and then as many web containers as you like.
Thanks @pat
I'm assuming running queries from your web rails containers to the sphinx container can be done over-the-wire using the mysql41 protocol? (that's the same as having a remote sphinx instance, but I've never done it, so not too sure about the fine-details).
What if you want to reindex e.g. from a background job though? (that's what I ended up doing in #1048 for example).
mysql41 is just TCP, so it'll be done over the specified port without any issues 👍
As for re-indexing from a background job, I guess you'll want the background worker and Sphinx in the same container? Or at least, a worker for Sphinx-specific jobs in the Sphinx container (and then other background worker containers perform all non-Sphinx jobs).
I'll have to play around with it and see... any pointers on using a remote sphinx instance?
Regarding the re-indexing, that's just one example. The problem with this setup is that you always need to be aware of what calls the underlying searchd
or indexer
binaries, and need to make sure it runs in the right container. It sounds very limiting and awkward I have to say.
A remote Sphinx instance is going to be similar to a remote database - you'll need to consider where the files are located (and have them backed up / re-used between instance boots).
As for reliance on the binaries - essentially it boils down to three key aspects that they're required for:
Thanks @pat. I think I understand the what, but not so much the how...
A remote Sphinx instance is going to be similar to a remote database - you'll need to consider where the files are located (and have them backed up / re-used between instance boots).
Which files are you referring to? (yaml files?). And yes, it's similar. But as a client to a remote database, you typically don't need to trigger any re-indexing and all operations are available over-the-wire (apart from rather rare DBA-type maintenance). With Sphinx/TS it's quite common to need to trigger those operations, and when the instance is remote it's suddenly far from trivial...
I wish there was some way to trigger those jobs over the mysql41 API itself, or if TS had some kind of a REST API that can be accessed remotely for those operations.
How would you carry out those operations that require binaries to run with a setup that has a dedicated Sphinx box connected by rails "clients" ?
Two different thoughts on this: firstly, to keep as many operations happening via SphinxQL commands over the mysql41 protocol, you could consider switching to real-time indices. Real-time indices can only be created/updated via SphinxQL commands, so it removes all need for the indexer
binary. Depending on how you're using Sphinx, of course, there could be other challenges from this switch, but I think it's worth investigating.
As for the files - when I was playing a little with Docker earlier this year, I saw recommendations of using PostgreSQL from within a Docker container, but linking it to my host machine's file system to ensure the database files were persisted across boots. I'm not sure if this is the way to do things normally (beyond development environments), but this is what I was thinking of with my last message.
The files in question for Sphinx would be the configuration file, the index files, logs, and perhaps the binlog files as well (all things that are configured via config/thinking_sphinx.yml
). The binlog files are only useful between boots if the daemon crashes, so perhaps they're not so important in this scenario.
Thanks again @pat.
Real-time indices look interesting. What's the performance impact of them however from your experience? and also I'm a bit hesitant using callbacks (isn't this akin to managing model caching / sweepers? cache-invalidation is one of those sticky problems that always bites you when you least expect it)
Just curious - Is it technically possible to trigger those binary calls via SphinxQL over mysql41? or is it entirely impossible and the protocol doesn't support these types of "commands"?
As for files - as far as I can tell, Sphinx can be pretty stateless. Any files / configs, as well as search indexes are generated and then effectively cached. So if you have a running sphinx container, you load it once, and then either keep it running for as long as you need, or replace it with a new container. The new container will have to re-compile the configs and load the index etc, but once running, it's ready. So in that sense it's not really like PG or any other database. You don't lose any real data when you load your sphinx container "from scratch".
I use real-time indices in all of my current projects where TS is being used, and don't notice any performance hits for the most part. Yes, the callbacks aren't ideal - I'm on board with your concerns there! - but it removes the need for deltas.
However, the initial indexing (which is done via the ts:generate
or ts:regenerate
tasks) is certainly slower, because every record is instantiated from within Rails, rather than via SQL queries. With this in mind, it's why I'd actually look at storing the Sphinx files in your container between boots - granted, this isn't such a big problem when developing locally, provided you don't have a huge amount of data.
Even with the callbacks, I'd still recommend having a scheduled cron job running ts:generate
daily, to catch any data updates that haven't fired the callbacks.
From a quick scan of the SphinxQL docs, it doesn't look like there's anything in there for invoking indexer
or searchd
: http://sphinxsearch.com/docs/current.html#sphinxql-reference - so I'm afraid you can't avoid the dependency on the binaries completely (though as mentioned previously, indexer
is no longer needed when using real-time indices).
That's good to know, @pat. (especially appreciate your being realistic about the trade-offs here).
I'm currently deliberating between adding a dedicated sidekiq process (and queue) inside the sphinx container, and using it as a tool to remotely trigger re-index operations by simply launching an async job. Or using realtime indices. Both options have some pros and cons, so we might just flip a coin and hope for the best ;-)
It's not something we're in a rush to implement, but when we do, I'd be sure to keep you posted. Maybe share some of our configs, recipes etc.
Thanks again for being so responsive and open, @pat.
Appreciate the feedback, and it's great to know my comments are appreciated :) Any notes from your experiences down either path would be great - good luck with putting it all together!
I’m curious to know what you succeeded to do, I’m working on the same problem and for now I picked the Sphinx and delta worker design using a delayed job queue.
Hi @webgem-jpl. I didn't post an update since we never released this in production. But we did create a solution that seems to work. The solution was to:
Dockerfile
similar to the one we use for rails, but with Sphinx code on top. This container will have access to rails, sidekiq as well as Sphinx binaries etcsphinx
queue, and will perform re-index operations when necessary.sphinx
queue. This ensures that the running job has access to sphinxOur sphinx Dockerfile was essentially the same I pasted above, but without the CMD rails s -b 0.0.0.0
directive.
Here's the launch script for the sphinx container:
#!/bin/bash
bundle check || bundle install
# this makes sure sphinx is running and listening for queries
bundle exec rake ts:rebuild
# this launches sidekiq on the `sphinx` queue
bundle exec sidekiq -C config/sidekiq.yml -q sphinx
Hope this helps. I think it's a reasonable solution, but obviously has some limitations and probably isn't ideal in docker / unix philosophy terms...
@gingerlime Please forgive me if this is a stupid question, but with that particular model/setup, how do you handle inter-container communication?
We have a dokku environment in production at the moment and don't have the time to migrate all of our apps away from Herokuish buildpacks, so I've written a custom sphinx buildpack and have launched two containers via a Procfile as follows:
thinking_sphinx: bundle exec rake ts:index && bundle exec rake ts:restart && bundle exec rake ts:periodically_reindex
web: bundle exec rackup -s puma -p $PORT -E $RACK_ENV
I've then set up a shared volume to store the indices, so that they're accessible from each container. 9 times out of 10 this works, but will sporadically give errors stating that Sphinx can't connect to the MySQL server.
In this setup, as with yours, each container has sphinx and thinking-sphinx installed and the thinking_sphinx
container starts the Sphinx daemon and periodically re-indexes using a custom rake task (which just re-executes the ts:index
rake task, sleeps for 90 seconds and then repeats). I'm using real-time indices.
For reference, I've set the config/thinking-sphinx.yml
file to:
<%= Rails.env %>:
mysql41: <%= ENV['SPHINX_PORT'] %>
indices_location: <%= ENV['SPHINX_INDICES_LOCATION'] %>
configuration_file: <%= ENV['SPHINX_CONFIGURATION_FILE_PATH'] %>
log: <%= ENV['SPHINX_LOG_FILE_PATH'] %>
query_log: <%= ENV['SPHINX_QUERY_LOG_FILE'] %>
pid_file: <%= ENV['SPHINX_PID_FILE'] %>
Any insight as to how you set up the web containers to reference/communicate with the Sphinx container would be greatly appreciated.
Thanks!
Hi @xtrasimplicity this seems rather specific, and I'm mostly guessing here... but:
Other than that, the main thing to check is that you publish the sphinx container via docker and give it a name (e.g. sphinx
) and then make sure you're accessing it using this name from other containers.
Thanks for the prompt response - it's much appreciated. I ended up upgrading to Sphinx v3.0.1, which solved the MySQL connection errors.
The plan is to only have one Sphinx container running at any time, with the web containers connecting to the single Sphinx container. Whilst each web container would also have Sphinx installed, as I'm using herokuish buildpacks, my intention would be that the local versions aren't used.
I've set the address
attribute in my thinking-sphinx YAML file to the container's name, but get a FATAL: no AF_INET address found for: thinking_sphinx
error on deployment. I suspect my issues are caused by a lack of understanding of inter-container communication in dokku (or most likely, docker), so I'll read up a bit more before I continue.
For the moment, I've got it running with a single web instance without any issues, so I'll move on to other things and come back to this next week.
Thanks once again for your suggestions! :)
just off the top of my head, some suggestions:
-
instead? it might work, but seems safer to avoid. I'm not familiar with dokku, but with docker-compose it seems to work for us (we just use sphinx
as the name);depends_on
between your web and sphinx containers (web depends on sphinx)I'd like to refresh this topic. What I end up with is installing the Sphinx binary inside the running container. Don't know how to share the configuration. My example is quite simple, there is only one instance of rails appliaction.
I'd appreciate any help.
If anyone is interested, I got this setup working too. Running sphinx in a separate docker container, which includes the app code and a slightly different thinking_sphinx config. The main app container containing the sphinx client needs to have the connection options set to address the sphinx container, e.g.:
connection_options:
host: "sphinx"
port: "9306"
It is important that the thinking_sphinx config inside the container does not have these options. So when I build the sphinx image i overwrite the config file with a custom one made for the container, omitting connection_options.
Let me know if you have questions on how to set it all up.
@ncri I'm interested in your docker-compose.yml file and both Dockerfiles.
Thanks :]
These are the relevant parts of the docker-compose.yml (i omitted volumes and dependencies - also at the moment there is no volume for the sphinx indexes, they simply sit in the sphinx container):
app:
build:
context: .
dockerfile: Dockerfile.dev
command: sh start_server.sh
sphinx:
build:
context: .
dockerfile: DockerfileSphinx.dev
command: sh start_sphinx.sh
Dockerfile.dev:
FROM starefossen/ruby-node:2-4
RUN apt-get update -qq && \
apt-get install -y nano build-essential libpq-dev && \
npm cache clean -f && \
npm install -g n && \
n stable && \
gem install bundler
WORKDIR /usr/src/app
COPY Gemfile Gemfile.lock ./
COPY components ./components
RUN bundle install
EXPOSE 3000
COPY . .
DockerfileSphinx.dev (code partly copied from: https://github.com/macbre/docker-sphinxsearch/blob/master/Dockerfile):
FROM starefossen/ruby-node:2-4
ENV SPHINX_VERSION 3.0.3-facc3fb
RUN apt-get update -qq && apt-get install -y \
mysql-client unixodbc libpq5 wget
RUN apt-get install -y nano build-essential libpq-dev && \
npm cache clean -f && \
npm install -g n && \
n stable && \
gem install bundler
# set timezone
# @see http://unix.stackexchange.com/a/76711
RUN cp /usr/share/zoneinfo/CET /etc/localtime && dpkg-reconfigure --frontend noninteractive tzdata
# set up and expose directories
RUN mkdir -pv /opt/sphinx/log /opt/sphinx/index
# http://sphinxsearch.com/files/sphinx-3.0.3-facc3fb-linux-amd64.tar.gz
RUN wget http://sphinxsearch.com/files/sphinx-${SPHINX_VERSION}-linux-amd64.tar.gz -O /tmp/sphinxsearch.tar.gz
RUN cd /opt/sphinx && tar -xf /tmp/sphinxsearch.tar.gz
RUN rm /tmp/sphinxsearch.tar.gz
# point to sphinx binaries
ENV PATH "${PATH}:/opt/sphinx/sphinx-3.0.3/bin"
RUN indexer -v
WORKDIR /usr/src/app
COPY Gemfile Gemfile.lock ./
COPY components ./components
RUN bundle install
COPY . .
COPY ./config/thinking_sphinx_search_container.yml ./config/thinking_sphinx.yml
EXPOSE 9306
start_sphinx.sh:
rake ts:start
tail -f log/development.searchd.query.log -f log/development.searchd.log
Hi, I'm looking to finish my Sphinx setup with Rails and Docker but enable to find a complete working project. Anyone mind to share a working repo or gists? Thanks.
All of mine are closed source, but I'll try to create an MCVE when I get a moment.
On Mon, 23 Dec 2019, 13:34 Peter Dirickson, notifications@github.com wrote:
Hi, I'm looking to finish my Sphinx setup with Rails and Docker but enable to find a complete working project. Anyone mind to share a working repo or gists? Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pat/thinking-sphinx/issues/1010?email_source=notifications&email_token=ACHCTTMUCXFNBXXY2MJSZWTQ2APUFA5CNFSM4COSAMZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHQBIAA#issuecomment-568333312, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHCTTMKLQ5HBGQMTJ7KM3LQ2APUFANCNFSM4COSAMZA .
Hi, I'm looking to finish my Sphinx setup with Rails and Docker but enable to find a complete working project. Anyone mind to share a working repo or gists? Thanks.
Hi @piclez, here's a gist: https://gist.github.com/xtrasimplicity/662d7bc33d6875bbd0a454110a289496
Note: I use real-time indices. I've also stripped this from a closed-source app, so there may be a few small things missing. Feel free to ping me if you have any issues. :)
We've been using this in production for about a year and it's been great!
Hi @xtrasimplicity I have used your setup, but whenever I run docker-compose up
, the sphinx container exits because it is not able to connect with mysql server with this error:
Mysql2::Error::ConnectionError: Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (2)
Please let me know if you have an idea about the cause for this.
@jerome313 , make sure that your Sphinx container is configured to your DB container for SQL. That error means that it is trying to connect to the local socket and fails as there's no MySQL server running inside the Sphinx container, bound to a socket at that path.
I've updated my example Gist with a slightly newer approach.
Hello Pat, I am trying to setup our development and deployments with docker containers. I am docker.app on mac, I configured rails and mysql, sphinx containers to work together in docker-compose. They all came up fine, rails talking to mysql etc fine. How do I configure thinking-sphinx to create indexes on sphinx container by connecting to mysql container. I did everything to run in one box before, now these containers act like different machines, how do I do that? Please share your experiences working with docker, docker-compose file would be a big help. Thanks in advance.