meilisearch / meilisearch-rails

Meilisearch integration for Ruby on Rails
https://www.meilisearch.com
MIT License
295 stars 47 forks source link

Skip indexCreation and settingsUpdate whenever adding new record in production #280

Closed mech closed 9 months ago

mech commented 1 year ago

Whenever calling ms_index!, I noticed there will always be 3 tasks occurring which might not be necessary and may even be detrimental to million records indexes.

# uid: indexUid - status / type
78571: Movie - succeeded / documentAdditionOrUpdate
78570: Movie - succeeded / settingsUpdate (===> Takes a long time to finish depending on million records)
78569: Movie - failed / indexCreation

It appear that indexCreation will always failed, then settingsUpdate will always take a long time (depending on how many records you have).

Is there a way to not do indexCreation + settingsUpdate and just do documentAdditionOrUpdate in a more intelligent manner?

And let us do settingsUpdate manually at least in production mode where it seldom changes.

brunoocasali commented 1 year ago

I saw your comment here https://github.com/meilisearch/meilisearch-rails/issues/198#issuecomment-1694348140

Can you show me your code related to this issue?

mech commented 1 year ago

I find that I can disable settingsUpdate from getting call repeatedly by setting check_settings to false:

meilisearch check_settings: false do
end

To be honest, I have no idea why in my production Rails, every visit to MS query or update will "enqueued" the "settingsUpdate" repeatedly. This will typically last for a long time (10 minutes if one has million records) before the actual "documentAdditionOrUpdate" can "succeed".

The code is nothing fantastic, just a normal ms_index! call.

By right looking at the library code, it appear that it should be cached by comparing the previous setting if it got changed or not, but somehow it is not.

n044y_11_25_31
brunoocasali commented 1 year ago

Yeah, you're probably facing a bug indeed. I will push forward this bug, so I will try to work on it ASAP, but I can't guarantee any deadline to you, unfortunately (we're short on time).

So I recommend that if you want to fix it, I ask you to provide the PR. I'll be glad to review it.

brunoocasali commented 1 year ago

So, @mech, I'm trying to reproduce your case in the test suite, but it needs to be fixed. I made some improvements in some areas, but I couldn't reproduce the same issue you're facing, so I will only push something once I have a real breaking test case to fix.

Can you tell me if you're calling ms_index! directly? (Because this method invocation is usually tied to the AR callbacks, you don't need to trigger it yourself.)

Also, please provide me with your complete model configuration and an example of how you're using the gem. It will significantly help me. Information like if you're using sidekiq jobs to process the data also helps.

Otherwise, I'm pretty stuck on the investigation :|

I really appreciate any help you can provide.

fulfilnet commented 11 months ago

got same issue here,

just starting to use meilisearch, I can share my codes here.

Codes

search module

module Meilisearchable
  module Order
    extend ActiveSupport::Concern

    INDEX_ATTRS = [
      :id,
      :store_id,
      :merchant_id,
      :order_status,
      :tracking_number,
      :package_id,
      :short_order_id,
      :internal_order_tag,
      :platform_order_id,
      :courier_name,
      :warehouse_id,
      :order_items_skus,
      :order_items_allocate_data,
      :order_items_products_sku,
      :order_items_products_bundle_settings,
      :order_items_products_barcode,
      :store_platform_id,
      :input_type
    ].freeze

    included do
      include MeiliSearch::Rails

      meilisearch enqueue: :trigger_sidekiq_job, force_utf8_encoding: true, primary_key: :id do
        attributes INDEX_ATTRS
        attribute :created_at do
          created_at.to_i
        end
        attribute :packed_at do
          packed_at.to_i
        end
        attribute :platform_order_created_at do
          platform_order_created_at.to_i
        end

        displayed_attributes INDEX_ATTRS
        searchable_attributes [:order_items_skus, :order_items_allocate_data, :order_items_products_sku, :order_items_products_bundle_settings, :order_items_products_barcode, :platform_order_id, :short_order_id, :courier_name, :tracking_number]
        filterable_attributes [:warehouse_id, :store_id, :merchant_id, :order_status, :tracking_number, :short_order_id, :created_at, :packed_at, :internal_order_tag, :platform_order_created_at, :store_platform_id, :id, :input_type]
        sortable_attributes [:created_at, :platform_order_created_at, :packed_at, :order_status]

        pagination max_total_hits: 10000 
      end
    end

    def order_items_skus
      order_items.map(&:sku)
    end

    def order_items_allocate_data
      order_items.map(&:allocate_data)
    end

    def order_items_products_sku
      order_items.map(&:product).map(&:sku)
    rescue
      []
    end

    def order_items_products_bundle_settings
      order_items.map(&:product).map(&:bundle_settings)
    rescue
      []
    end

    def order_items_products_barcode
      order_items.map(&:product).map(&:barcode)
    rescue
      []
    end

    def store_platform_id
      store&.platform_id
    end

    class_methods do
      def trigger_sidekiq_job(record, remove)
        MeilisearchIndexes::OrderWorker.perform_async(record.id, remove)
      end
    end
  end
end

model:

class Order < ApplicationRecord
  include Meilisearchable::Order
  ...
end

worker

module MeilisearchIndexes
  class OrderWorker
    include Sidekiq::Worker
    sidekiq_options queue: "search_index"

    def perform(id, remove)
      if remove
        remove_from_index(id)
      else
        add_to_index(id)
      end
    end

    private

    def add_to_index(id)
      order = Order.find(id)
      order.index!
    end

    def remove_from_index(id)
      # The record has likely already been removed from your database so we cannot
      # use ActiveRecord#find to load it.
      # We access the underlying Meilisearch index object.
      Order.index.delete_document(id)
    end
  end
end

what I did which trigger issue

start_date = Order.order(:created_at).first.created_at.to_date.beginning_of_month
end_date = Time.current.to_date.end_of_month

current_date = start_date

while current_date <= end_date
  puts current_date

  Order.where(created_at: current_date..current_date.end_of_month).find_each do |order|
    MeilisearchIndexes::OrderWorker.perform_async(order.id, false)
  end

  current_date = current_date.next_month.beginning_of_month
end

how i run meilisearch server?

ref from https://github.com/meilisearch/meilisearch-kubernetes

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: meilisearch
spec:
  replicas: 1
  selector:
    matchLabels:
      app: meilisearch
  template:
    metadata:
      labels:
        app: meilisearch
    spec:
      volumes:
        - name: tmp
          emptyDir: {}
      containers:
        - name: meilisearch
          image: getmeili/meilisearch:v1.3
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: meilisearch-data-pvc
              mountPath: /meili_data
          envFrom:
          - configMapRef:
              name: fulfilnet-api-env
          ports:
            - name: http
              containerPort: 7700
              protocol: TCP
          startupProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 1
            initialDelaySeconds: 1
            failureThreshold: 60
          livenessProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 10
            initialDelaySeconds: 0
          readinessProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 10
            initialDelaySeconds: 0
          resources:
            {}
  volumeClaimTemplates:
  - metadata:
      name: meilisearch-data-pvc
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

this is what i see the current processing task

{"results"=>
  [{"uid"=>751107,
    "indexUid"=>"Order_production",
    "status"=>"processing",
    "type"=>"settingsUpdate",
    "canceledBy"=>nil,
    "details"=>
     {"displayedAttributes"=>
       ["id",
        "store_id",
        "merchant_id",
        "order_status",
        "tracking_number",
        "package_id",
        "short_order_id",
        "internal_order_tag",
        "platform_order_id",
        "courier_name",
        "warehouse_id",
        "order_items_skus",
        "order_items_allocate_data",
        "order_items_products_sku",
        "order_items_products_bundle_settings",
        "order_items_products_barcode",
        "store_platform_id",
        "input_type"],
      "searchableAttributes"=>
       ["order_items_skus",
        "order_items_allocate_data",
        "order_items_products_sku",
        "order_items_products_bundle_settings",
        "order_items_products_barcode",
        "platform_order_id",
        "short_order_id",
        "courier_name",
        "tracking_number"],
      "filterableAttributes"=>
       ["created_at",
        "id",
        "input_type",
        "internal_order_tag",
        "merchant_id",
        "order_status",
        "packed_at",
        "platform_order_created_at",
        "short_order_id",
        "store_id",
        "store_platform_id",
        "tracking_number",
        "warehouse_id"],
      "sortableAttributes"=>["created_at", "order_status", "packed_at", "platform_order_created_at"],
      "pagination"=>{"maxTotalHits"=>10000}},
    "error"=>nil,
    "duration"=>nil,
    "enqueuedAt"=>"2023-09-25T02:22:43.746756545Z",
    "startedAt"=>"2023-09-25T07:01:39.137600417Z",
    "finishedAt"=>nil}],
 "total"=>1,
 "limit"=>20,
 "from"=>751107,
 "next"=>nil}

vs i run Order.index.get_settings

[10] pry(main)> Order.index.get_settings
=> {"displayedAttributes"=>
  ["id",
   "store_id",
   "merchant_id",
   "order_status",
   "tracking_number",
   "package_id",
   "short_order_id",
   "internal_order_tag",
   "platform_order_id",
   "courier_name",
   "warehouse_id",
   "order_items_skus",
   "order_items_allocate_data",
   "order_items_products_sku",
   "order_items_products_bundle_settings",
   "order_items_products_barcode",
   "store_platform_id",
   "input_type"],
 "searchableAttributes"=>
  ["order_items_skus",
   "order_items_allocate_data",
   "order_items_products_sku",
   "order_items_products_bundle_settings",
   "order_items_products_barcode",
   "platform_order_id",
   "short_order_id",
   "courier_name",
   "tracking_number"],
 "filterableAttributes"=>
  ["created_at",
   "id",
   "input_type",
   "internal_order_tag",
   "merchant_id",
   "order_status",
   "packed_at",
   "platform_order_created_at",
   "short_order_id",
   "store_id",
   "store_platform_id",
   "tracking_number",
   "warehouse_id"],
 "sortableAttributes"=>["created_at", "order_status", "packed_at", "platform_order_created_at"],
 "rankingRules"=>["words", "typo", "proximity", "attribute", "sort", "exactness"],
 "stopWords"=>[],
 "synonyms"=>{},
 "distinctAttribute"=>nil,
 "typoTolerance"=>{"enabled"=>true, "minWordSizeForTypos"=>{"oneTypo"=>5, "twoTypos"=>9}, "disableOnWords"=>[], "disableOnAttributes"=>[]},
 "faceting"=>{"maxValuesPerFacet"=>100, "sortFacetValuesBy"=>{"*"=>"alpha"}},
 "pagination"=>{"maxTotalHits"=>10000}}

i don't known why it keep creating settingsUpdate task (I try to delete task before but it will still be invoke somewhere)

btw, not sure it's related or not~

every time I redeploy the BE code, the index creation job will fail, due to

image

(i tried to add primary_key: :id but still not working so my workaround is to set here Rails.application.config.after_initialize do)

mech commented 11 months ago

For now, I just disable settingsUpdate and manually do it whenever I know there are changes 😬

class ApplicationRecord < ActiveRecord::Base
  def self.update_ms_settings!
    index_settings = meilisearch_settings.to_settings

    MeiliSearch::Rails.client.index(index.uid).update_settings(index_settings)
  end
end

# Always disabled
class User
  meilisearch check_settings: false do
    # xxx
  end
end

If there are changes to my settings, I got no choice but to go into production maintenance mode and do the User.update_ms_settings! manually rather than let the library handle it every time 😥.

fulfilnet commented 11 months ago

thx @mech i also implement same workaround like you !!!!

ellnix commented 11 months ago

Hi @fulfilnet @mech

I spent some time trying to figure this issue out and I believe I did, please see the pull request above. If you would like you can test this solution before the PR is merged, only about 5 lines are edited. I would appreciate the feedback.