Open mikemccand opened 5 years ago
I don't think this would solve any problem? Collectors can only run from a single thread anyway, and all collectors could have a CollectorManager provided that there is a way that the results that they produce can be merged?
[Legacy Jira: Adrien Grand (@jpountz) on Sep 04 2019]
Yeah, I agree.
My only gripe is that in case a collector is not really reducible or has some semantic constraints against concurrency, we do not provide any defense against getting into an unknown state.
Maybe it is not an engine problem but more of a user issue – but I wanted to raise this point and see if we have any thoughts about this.
[Legacy Jira: Atri Sharma (@atris) on Sep 04 2019]
Do we have examples of collectors in Lucene today that are single-threaded? The core collectors, at least TopFieldCollector
and TopDocsCollector
seem to be OK since IndexSearcher
makes a CollectorManager
that uses TopDocs.merge
in the end.
So maybe as long as a CollectorManager
is available that implies it is thread safe?
[Legacy Jira: Michael McCandless (@mikemccand) on Sep 05 2019]
There is an implied assumption today that all we need to run a query concurrently is a CollectorManager implementation. While that is true, there might be some corner cases where a Collector's semantics do not allow it to be concurrently executed (think of ES's aggregates). If a user manages to write a CollectorManager with a Collector that is not really concurrent friendly, we could end up in an undefined state.
This Jira is more of a rhetorical discussion, and to explore if we should allow Collectors to implement an API which simply returns a boolean signifying if a Collector is parallel ready or not. The default would be true, until a Collector explicitly overrides it?
Legacy Jira details
LUCENE-8963 by Atri Sharma (@atris) on Sep 04 2019, updated Sep 05 2019