rs-pro / mongoid-elasticsearch

DEPRECATED - Consider using SearchKick https://github.com/ankane/searchkick
MIT License
61 stars 23 forks source link

calling index_all very slow if model collection is large #13

Open warmwind opened 9 years ago

warmwind commented 9 years ago

We have a collection called Entry which has 10,000,000+ record. When calling Entry.es.index_all, it is very very slow...:(

the code in Mongoid::Elasticsearch::Es is

      def index_all(step_size = INDEX_STEP)
        index.reset
        q = klass.order_by(_id: 1)
        steps = (q.count / step_size) + 1
        steps.times do |step|
          docs = q.skip(step * step_size).limit(step_size)
          docs = docs.map do |obj|
            if obj.es_index?
              { index: {data: obj.as_indexed_json}.merge(_id: obj.id.to_s) }
            else
              nil
            end
          end.reject { |obj| obj.nil? }
          next if docs.empty?
          client.bulk({body: docs}.merge(type_options))
          if block_given?
            yield steps, step
          end
        end
      end

First, when calling count on large collection, it is slow Second, calling skip is very slow when skip too many records.

I think to improve the performance here, we could cache the start id for each step and query record by the id range. The code will be

docs = q.where(:id.gte => start_id).limit(step_size)

What do you think? Or how to index such a large collection?

warmwind commented 9 years ago

the code could be changed to

      def index_all(step_size = INDEX_STEP)
        index.reset
        q = klass.asc(:id)
        steps = (q.count / step_size) + 1
        last_id = nil
        steps.times do |step|
          if last_id
            docs = q.gt(id: last_id).limit(step_size).to_a
          else
            docs = q.limit(step_size).to_a
          end
          last_id = docs.last.id
          docs = docs.map do |obj|
            if obj.es_index?
              {index: {data: obj.as_indexed_json}.merge(_id: obj.id.to_s)}
            else
              nil
            end
          end.reject { |obj| obj.nil? }
          next if docs.empty?
          client.bulk({body: docs}.merge(type_options))
          if block_given?
            yield steps, step
          end
        end
      end

I tested in my local env. if you think it is ok, I can submit a pull request

glebtv commented 9 years ago

Looks good to me. Pull request would be welcome. Thank you!

jnpoyser commented 9 years ago

+1 for the pull request ;-)

netwire88 commented 9 years ago

Is this needed because Mongoid doesn't have a similar method to ActiveRecord's find_in_batches?