pat / riddle

Ruby Client API for Sphinx
MIT License
136 stars 67 forks source link

Zero results (occasionally) #17

Closed robbyrussell closed 13 years ago

robbyrussell commented 13 years ago

We moved from acts_as_sphinx last month to ThinkingSphinx and since then have been trying to track down an intermittent problem that we can't seem to get to the bottom of.

Basically, every so often... we'll get zero results back from sphinx queries that we know have a lot of results. However, when we look at the searchd.query.log on the sphinx server we see it showing that it's supposed to be returning a number of results.

Here.. a quick breakdown...

[Thu Mar 17 16:06:29.037 2011] 0.001 sec [all/3/ext 212 (3000,1000)] [tour_variation_core] tour
[Thu Mar 17 16:06:29.550 2011] 0.001 sec [all/3/ext 212 (3000,1000)] [tour_variation_core] tour
[Thu Mar 17 16:06:52.796 2011] 0.001 sec [all/3/ext 212 (3000,1000)] [tour_variation_core] tour
[Thu Mar 17 16:06:53.321 2011] 0.001 sec [all/3/ext 212 (3000,1000)] [tour_variation_core] tour

However, during this period... our app had one instance return zero results. (yet we're seeing 212 each time on the sphinx server).

Upon researching the problem, I found this thread which sounds like the closest behavior to what we're seeing.

Given that Riddle is our new client, I'm wondering if anybody else is seeing similar problems (on a fairly high-traffic site). I haven't dug too deep into the internals... but something is definitely happening after sphinx returns results and by the time our thinking-sphinx search gets it's collection back.

Any thoughts and/or suggestions on how to better diagnosis the problem on our end? Right now, inspecting the results from thinking-sphinx is just returning an empty array... so not too valuable at this point.

Also, we can't really reproduce this locally... only seems to be happening in production, which is on:

ruby 1.8.6 (2008-08-11 patchlevel 287) [x86_64-linux]

Thanks in advance for any assistance you can provide us!

Cheers, Robby

robbyrussell commented 13 years ago

Also... versions of riddle/thinking-sphinx

riddle (1.2.2)
thinking-sphinx (1.4.3)
pat commented 13 years ago

Hi Robby

Does the raw Sphinx results hold any clues? TourVariation.search('tour').results

robbyrussell commented 13 years ago

Thanks for the suggestion. I'll turn that on in my logging and see what I uncover!

robbyrussell commented 13 years ago

Okay, here is what shows up in one of those zero results scenarios.

SEARCH: riddle results: {:attribute_names=>["sphinx_internal_id", "class_crc", "sphinx_deleted", "tour_name_sort", "tour_id_sort", "selling_company_id", "departure_id", "brochure_id", "brochure_variation_id"], :matches=>[], :status=>0, :attributes=>{"brochure_variation_id"=>1073741825, "brochure_id"=>1, "tour_name_sort"=>3, "selling_company_id"=>1, "sphinx_deleted"=>1, "class_crc"=>1, "tour_id_sort"=>3, "departure_id"=>1073741825, "sphinx_internal_id"=>1}, :total_found=>40, :total=>40, :time=>0.001, :words=>{"operationalregionaustralia"=>{:docs=>326, :hits=>326}}, :fields=>["tour_variation_name", "tour_name", "tour_sightseeing", "tour_intro", "tour_transportation", "tour_travels_to", "tour_tags_cache", "tour_operating_product_code", "tour_type_name", "places_cache", "optionals_cache", "highlights_cache", "tour_code", "tour_id"]}

Not entirely sure how to parse this... (total found and such).. the array itself was empty though.

Any suggestions?

pat commented 13 years ago

Hmm, Sphinx isn't making this easy... supposedly 40 results, but nothing returned... and no error or warning, either. Is there anything in the searchd log hinting of any problems?

eddorre commented 13 years ago

Hi Pat,

I'm working with Robby on this issue.

I'm curious about a value that was in the hash that Robby posted. The key/value pair is: "sphinx_deleted"=>1. From what I have observed, in working data sets this key/value pair is "sphinx_deleted"=>0.

From the code that I have read in the Thinking Sphinx library, it looks like sphinx_deleted gets set to 1 by the delete_in_index method call which in turn can be triggered from the toggle_deleted method and happens when a model object is destroyed.

Alternatively, it can also be called from the delete_from_core method from the Deltas module, but we're not using Deltas for that model, so I do not suspect that this is an issue.

I'll have to do an inventory of the app code, but I don't believe that those type of objects, TourVariation, are destroyed very often, if at all (we mostly just archive old data).

Assuming that those objects are not being destroyed, what is the possible cause of sphinx_deleted being set to 1 and might that have any bearing on this issue?

Thanks for any help that you can give us.

Cheers,

Carlos

pat commented 13 years ago

Hi Carlos

On my phone, so please excuse the brevity. The list of attributes in that hash is just stating what data types each attribute is (integers are 1, and so on). It's not returning any data for any search results, so I wouldn't read into that hash too much.

Pat

On Mar 18, 2011, at 7:36 PM, eddorre reply@reply.github.com wrote:

Hi Pat,

I'm working with Robby on this issue.

I'm curious about a value that was in the hash that Robby posted. The key/value pair is: "sphinx_deleted"=>1. From what I have observed, in working data sets this key/value pair is "sphinx_deleted"=>0.

From the code that I have read in the Thinking Sphinx library, it looks like sphinx_deleted gets set to 1 by the delete_in_index method call which in turn can be triggered from the toggle_deleted method and happens when a model object is destroyed.

Alternatively, it can also be called from the delete_from_core method from the Deltas module, but we're not using Deltas for that model, so I do not suspect that this is an issue.

I'll have to do an inventory of the app code, but I don't believe that those type of objects, TourVariation, are destroyed very often, if at all (we mostly just archive old data).

Assuming that those objects are not being destroyed, what is the possible cause of sphinx_deleted being set to 1 and might that have any bearing on this issue?

Thanks for any help that you can give us.

Cheers,

Carlos

Reply to this email directly or view it on GitHub: https://github.com/freelancing-god/riddle/issues/17#comment_891654

pat commented 13 years ago

Robby, Carlos - did you ever get to the bottom of this issue?

robbyrussell commented 13 years ago

Not yet, we're having the sphinx folks take a look for us in the next week or so.

On Mon, May 9, 2011 at 02:58, freelancing-god < reply@reply.github.com>wrote:

Robby, Carlos - did you ever get to the bottom of this issue?

Reply to this email directly or view it on GitHub: https://github.com/freelancing-god/riddle/issues/17#comment_1120760

pat commented 13 years ago

Ah, good to know. If there's any questions I can help with, please do send them my way.

On 09/05/2011, at 11:06 AM, robbyrussell wrote:

Not yet, we're having the sphinx folks take a look for us in the next week or so.

On Mon, May 9, 2011 at 02:58, freelancing-god < reply@reply.github.com>wrote:

Robby, Carlos - did you ever get to the bottom of this issue?

Reply to this email directly or view it on GitHub: https://github.com/freelancing-god/riddle/issues/17#comment_1120760

Reply to this email directly or view it on GitHub: https://github.com/freelancing-god/riddle/issues/17#comment_1120777

robbyrussell commented 13 years ago

Pat,

Okay... if you look back at the original post... does (3000,1000) mean... give me the up to 1000 records starting at an offset of 3000?

I've just noticed that all of our failures are showing 3000,1000 and I'm also seeing 0,1000 in our logs for non-failures. Hmm... if that is the offset, I'm curious as to where that might be getting dynamically set as we don't set 3000 anywhere in our app that I can track down.

Thoughts?

robbyrussell commented 13 years ago

...scratches head as to why this number would randomly change in production.

robbyrussell commented 13 years ago

The only thing I can conclude is that perhaps...

current_page is randomly going astray... or offset is.

pat commented 13 years ago

The only place offset hits any slightly complex logic (that is, anything but get/set/send-to-searchd), is this line in thinking_sphinx/search.rb:

@options[:offset] || ((current_page - 1) * per_page)

And yeah, that's not really suspicious. current_page and per_page aren't doing anything crazy either. And I'm guessing your app wouldn't be randomly screwing this app, so we come back to Sphinx being at fault... and scanning through the searchd.cpp code, nothing jumps out as particularly dodgy either (not that my C++ skills are anything close to decent).

Also - an offset larger than the max_matches value should raise an error: 'offset out of bounds'.

I don't suppose you can share the code in your controller related to the search call in your app?

robbyrussell commented 13 years ago

This is what our code looked like...

  tour_variations = TourVariation.search(query,
                        :with => { :selling_company_id => selling_company.id },
                        :per_page => 1000,
                        :max_matches => 30000,
                        :retry_stale => 5,
                        :sort_mode => :extended,
                        :order => '@weight DESC, tour_name ASC')

I just began passing :offset => 0 to this and it's not occuring anymore that I can see. We don't have any paging with this feature of our app... so I don't know how it's setting an offset like this.

pat commented 13 years ago

Well, if a manual :offset fixes the problem, that's great :) Could also add a manual :page => 1 just in case, but I guess let's wait and see if the problem crops up again.

robbyrussell commented 13 years ago

What I'm concerned about is that perhaps something is changing the value of current_page. It seems to start popping up (randomly) after the app has been running a while (several hours sometimes).... could be a memory related bug or something? Just a thought...

pat commented 13 years ago

Possibly - though each search is its own object, so would either be a Ruby or Sphinx bug?

That said - has the problem returned at all?

robbyrussell commented 13 years ago

It hasn't appeared again since we set the offset.

On 24 May 2011 09:00, freelancing-god reply@reply.github.com wrote:

Possibly - though each search is its own object, so would either be a Ruby or Sphinx bug?

That said - has the problem returned at all?

Reply to this email directly or view it on GitHub: https://github.com/freelancing-god/riddle/issues/17#comment_1226475

pat commented 13 years ago

Well, that's good to know. I'll close this ticket, then... though I'd love to know what the true cause of it is.