meilisearch / meilisearch-rails

Meilisearch integration for Ruby on Rails
https://www.meilisearch.com
MIT License
295 stars 47 forks source link

Return all index items when using basic implementation, eager loaded associations and pagy_search method / limit is ignored #281

Closed Menelle closed 10 months ago

Menelle commented 1 year ago

We are using Rails 7 here and the pagy gem to paginate results. The implementation is basic, similar to the one mentioned in the docs. We were experiencing poor performances, investigation showed that our whole index is returned by meilisearch when no filter is passed, even with the limit set. Did we do something wrong?

Description Rails 7 + meilisearch + pagy Simple search using the pagy_search method. Some associations are eager loaded through :includes

Expected behavior Returns only the number of items specified by the :limit options

Current behavior Query with pagy_search returns the whole index

Screenshots or Logs

     # inside controller
     product_filters = ["type=AppProduct AND release_at_timestamp<=#{Date.current.strftime('%s')}"]

    options =
      { filter: product_filters, limit: 240 }.tap { |options|
        options[:sort] = ["release_at_timestamp:desc"] if params[:q].to_s.blank?
      }

    q = Product.includes(:category_secondary, :gender, :brand).pagy_search(params[:q], **options)

    puts "------------------- PAGY SEARCH RESULTS #{q.first.size} ---------------------"
    # => HERE THE FULL INDEX IS RETURNED (+6000 items)

    @pagy, @products = pagy_meilisearch(q, items: 24)
    ...

    # inside model
      include MeiliSearch::Rails
      extend Pagy::Meilisearch
      ActiveRecord_Relation.include Pagy::Meilisearch
      # scope :meilisearch_import, -> { eager_load(:category_secondary, :gender, :brand) } # we tried with and without scope

     meilisearch

Environment (please complete the following information):

brunoocasali commented 1 year ago

Hi @Menelle how are you?!

I've tried to reproduce your issue but I was not able to, in the meilisearch-rails you can find a playground that uses Kaminari and another model with pagy already configured.

Can you try to reproduce it using them?

Also, the pagy is only affected by the items: 24 not by the limit: ... during the pagy_search source

Menelle commented 1 year ago

Hello @brunoocasali, thank you for your message.

It seems that the issue occurs prior to the pagination (pagy_meilisearch call inside the controller) In our tests, the pagy_search call is actually returning the full index. this is the line?

We observe a different response from meili when we eager load associations. I think that's what's causing our issue as with the includes, the limit seems ignored.

without associations:

# controller
q = Product.pagy_search(params[:q], **options)
puts "------------------- RESPONSE #{q.inspect} ---------------------"
=> [Product(id: integer, brand_id: integer, ...), nil, {:filter=>["type=AppProduct AND release_at_timestamp<=1693958400"], :limit=>240, :sort=>["release_at_timestamp:desc"]}]

with associations loaded with includes:

# model
ActiveRecord_Relation.include Pagy::Meilisearch
# controller
q = Product.includes(:category_secondary, :gender, :brand).pagy_search(params[:q], **options)
puts "------------------- RESPONSE #{q.inspect} ---------------------"
=> [#<ActiveRecord::Relation [#<AppProduct id: 230, brand_id: 1, ...]>, nil, {:filter=>["type=AppProduct AND release_at_timestamp<=1693958400"], :limit=>240, :sort=>["release_at_timestamp:desc"]}]
puts "------------------- ITEMS RETURNED #{q.first.size} ---------------------"
=> "------------------- ITEMS RETURNED 6102 ---------------------"
# here Full index is returned, +6000 items, causing the bloat

Is this the expected behavior? We were expecting almost an instant response from this call as it is basically just building an array waiting to be executed by the next line. But the Self is executing the query which, at this point, has no conditions thus returns the whole table.

When we run the query using the ms_search method, we've got 240 items returned as expected (arg :limit)

    q = Product.includes(:category_secondary, :gender, :brand).ms_search(params[:q], **options)
    puts "------------------- PAGY SEARCH RESULTS #{q.inspect} ---------------------"
    => ------------------- PAGY SEARCH RESULTS [#<AppProduct id: 1886, brand_id: 2, ...] ---------------------
    puts "------------------- PAGY SEARCH RESULTS #{q.size} ---------------------"
    => ------------------- PAGY SEARCH RESULTS 240---------------------

From what we see in the codebase, the pagy_search method should behave like ms_search, and accept the limit arg. Just to be sure, we tried to call pagy_search with items: ... in place of limit: ..., and it raised MeiliSearch::ApiError (400 Bad Request - Unknown field `items`: expected one of `q`, `offset`, `limit`, `page`, `hitsPerPage`, `attributesToRetrieve`, `attributesToCrop`, `cropLength`, `attributesToHighlight`, `showMatchesPosition`, `filter`, `sort`, `facets`, `highlightPreTag`, `highlightPostTag`, `cropMarker`, `matchingStrategy as expected.

I hope your message was understood correctly. Let me know if we can provide more info. I will look at the playground you mentioned. Thank you.

brunoocasali commented 1 year ago

Hi @Menelle thanks for your answer!

I'm still trying to understand what is happening but I suppose you controller should look like this:

def index
  product_filters = ["type=AppProduct AND release_at_timestamp<=#{Date.current.strftime('%s')}"]

  options = { filter: product_filters }.tap { |options|
    options[:sort] = ["release_at_timestamp:desc"] if params[:q].to_s.blank?
  }

  hits = Product.includes(:category_secondary, :gender, :brand).pagy_search(params[:q], **options)
  @pagy, @products = pagy_meilisearch(hits, items: 240)
end

With no limit at all... Pagy uses the exhaustive pagination from Meilisearch behind the scenes.

If it does not work by doing that, I would suggest removing the includes and then adding the relationships again, one by one.

I'm eager to hear your feedback.

Menelle commented 11 months ago

Hello @brunoocasali The limit and eager loading seems to work. The benchmark/tap we used to monitor memory and perf played the query prior to pagy, resulting in the full index being fetched... So that's on us.

However, that was not the cause of the performance issue, it just not helped the debugging.

We removed our Frankfurt index from meilisearch and recreated it in San-Francisco. Our web servers are hosted in Oregon. Doing that, we observed a X10 in perf, from 650ms to 70ms.

brunoocasali commented 10 months ago

Amazing to know that! So I guess this issue can be closed right?! Let me know if this is a mistake! :)

Thanks for using Meilisearch!