Open warmwind opened 9 years ago
the code could be changed to
def index_all(step_size = INDEX_STEP)
index.reset
q = klass.asc(:id)
steps = (q.count / step_size) + 1
last_id = nil
steps.times do |step|
if last_id
docs = q.gt(id: last_id).limit(step_size).to_a
else
docs = q.limit(step_size).to_a
end
last_id = docs.last.id
docs = docs.map do |obj|
if obj.es_index?
{index: {data: obj.as_indexed_json}.merge(_id: obj.id.to_s)}
else
nil
end
end.reject { |obj| obj.nil? }
next if docs.empty?
client.bulk({body: docs}.merge(type_options))
if block_given?
yield steps, step
end
end
end
I tested in my local env. if you think it is ok, I can submit a pull request
Looks good to me. Pull request would be welcome. Thank you!
+1 for the pull request ;-)
Is this needed because Mongoid doesn't have a similar method to ActiveRecord's find_in_batches
?
We have a collection called
Entry
which has 10,000,000+ record. When callingEntry.es.index_all
, it is very very slow...:(the code in
Mongoid::Elasticsearch::Es
isFirst, when calling
count
on large collection, it is slow Second, callingskip
is very slow when skip too many records.I think to improve the performance here, we could cache the start id for each step and query record by the id range. The code will be
What do you think? Or how to index such a large collection?