pat / thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails
http://freelancing-gods.com/thinking-sphinx
MIT License
1.63k stars 470 forks source link

SSL error in server thread after invoking Thinking Sphinx #1087

Closed jasim closed 6 years ago

jasim commented 6 years ago

All calls to OpenSSL routines are failing after a thread (puma)/process (unicorn) executes a Sphinx query.

This error only occurs when the server is started using systemd; it works well if the daemon is started from bash. We have tried moving the Sphinx process to a different machine, and the error persists.

We're using thinking-sphinx (3.3.0) and mysql2 (0.3.21) gem versions.

Here are two example errors:

We're on Ubuntu 16.04.4 LTS. Here's our OpenSSL and Ruby versions:

$ openssl version

  OpenSSL 1.0.2g  1 Mar 2016

$ ruby -v -ropenssl -rfiddle -e 'puts Fiddle::Function.new(Fiddle.dlopen(nil)["SSLeay_version"], [Fiddle::TYPE_INT], Fiddle::TYPE_VOIDP).call(0)'

  ruby 2.3.6p384 (2017-12-14 revision 61254) [x86_64-linux]
  OpenSSL 1.0.2g  1 Mar 2016

$ ruby -r rbconfig -e 'puts RbConfig::CONFIG["configure_args"]'

 '--prefix=/usr' 'LDFLAGS=-L/usr/lib ' 'CPPFLAGS=-I/usr/include '

$ locate -r /lib/.*libssl.*so

  /lib/x86_64-linux-gnu/libssl.so.1.0.0

We've also asked this wrt to systemd in this related StackOverflow question

This server uses Google Cloud's Ubuntu image and the error does not occur in a different environment (DigitalOcean's Ubuntu droplet).

Is there anything we can investigate to understand the root cause of this issue? Thanks.

pat commented 6 years ago

Wow, this is certainly a frustrating situation for you! Excellent debugging thus far just to figure out some of the parts of what's causing it.

I'm at a bit of a loss as to the cause. Thinking Sphinx itself makes no use of OpenSSL… perhaps it's the invocation of mysql2 code that might be a part of it, when TS uses that for search queries? Are you using mysql2 elsewhere in your app (i.e. are you using MySQL as your database)? Or is TS the only reason you've got mysql2 in your Gemfile?

pat commented 6 years ago

Also, are you in a position to upgrade the mysql2 gem to their latest release (0.4.10) and see if that makes any difference?

gingerlime commented 6 years ago

Thanks for responding so quickly @pat. I'm working with @jasim on this one. We had the same hunch, but it's really hard to tell. (we're looking at this weird thing for the last 3 days or so).

We did try the latest Thinking Sphinx (3.4.2) and mysql2 gem versions, and it happens with those as well.

Anything else you think we should be looking at/into?

EDIT: we're not using mysql anywhere else. We use Postgresql as our main database, and redis for session store and caching.

pat commented 6 years ago

If it's possible, it'd be neat if you could see if a request that purely invoked mysql2 to make a call to Sphinx has the same impact - thus narrowing it down to whether TS is more directly involved or not. The following code should do the trick (alter the host, port, and query as needed):

client = Mysql2::Client.new(
  :flags           => Mysql2::Client::MULTI_STATEMENTS,
  :connect_timeout => 5,
  :host            => "127.0.0.1",
  :port            => 9306
)
results = client.query("SELECT * FROM article_core").to_a
gingerlime commented 6 years ago

That's a good idea. Thanks a lot @pat. We'll give it a try and let you know

gingerlime commented 6 years ago

Bingo! Yes, it does cause an error (I tested it on another instance with the older versions of TS and mysql, but I'll try to upgrade and test again. I imagine results would be the same).

Any thoughts what to do next? (other than reporting on the mysql gem I suppose...)

pat commented 6 years ago

Glad to have that extra bit of clarity… though I kinda wish it was TS, because at least we'd be dealing with only Ruby code rather than a gem with C extensions.

Definitely create an issue on the mysql2 repo, because someone there may have some ideas. Beyond that… could it be possible that mysql2's referencing a different version of OpenSSL? As part of your build process, is the gem installed after OpenSSL, or the other way around?

pat commented 6 years ago

There's still the question of why it's different depending on how the server is booted up… maybe it's a question of what systemd does after the server is started, and whether any of those later activities would influence behaviour…

gingerlime commented 6 years ago

Yes, it's very odd. Not only with a different initialization process, but also behaves differently on Google cloud and Digital Ocean... It's totally bizarre.

gingerlime commented 6 years ago

is the gem installed after OpenSSL, or the other way around?

We actually didn't explicitly install the openssl gem, but even after installing it (with/without adding :require, before or after mysql2), it still fails.

jasim commented 6 years ago

Closing this issue - the error disappeared by its own. We think this had something to do with the VMs. Certainly bizarre, but we aren't able to reproduce the error anymore. Thanks a lot for the help @pat .

pat commented 6 years ago

I'm glad to know things are working, though the fact it just disappeared is very odd! Fingers crossed it doesn't return.