mikemccand / luceneserver

A high performance "thin wrapper" HTTP REST server on top of Apache Lucene
Apache License 2.0
137 stars 35 forks source link

Finish "spillover shard" default implementation #13

Open mikemccand opened 3 years ago

mikemccand commented 3 years ago

I think I started this, long ago ... and the idea was not very well thought out, so let's try to figure out design, here:

The idea was to (by default?) handle time series indices well by simply appending the incoming document stream to a single shard (maybe sorted by the timestamp field), and once that shard was "big enough" (by some default yet configurable criteria), at that point switching to another shard, etc., rather than e.g. hash sharding to 5 shards concurrently by default.

Downside is indexing throughput is only what one node can handle, and so for cases where that is really a bottleneck, we can allow hash-sharding (off by default) out to N shards for better cross machine concurrency/indexing throughput. Another downside is less concurrency while searching, but, we should add cross-CPU concurrency for single queries on single nodes, which helps some.