Open bnewbold opened 4 years ago
Great point.
While reading https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html#_making_a_routing_value_required I think a custom routing value would not simplify things - they should be used at index and query time, etc.
The way I see how this could be done, would be a per-shard cache (option 4), in memory (or even temp files, if there are many shards).
This elasticsearch blog post implies that doing batch indexing of documents all going to the same shard at a time improves performance: https://www.elastic.co/blog/how-kenna-security-speeds-up-elasticsearch-indexing-at-scale-part-1
The feature request for esbulk would be to somehow automate this speed-up, without users needing to re-sort or partition documents themselves. Some unstructured thoughts about this:
_routing
field in documents, or fall back to_id
or a key field if set