Allow Collectors To "Publish" If They Can Be Used In Concurrent Search [LUCENE-8963]

mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration

0 stars 0 forks source link

Allow Collectors To "Publish" If They Can Be Used In Concurrent Search [LUCENE-8963] #960

Open mikemccand opened 5 years ago

mikemccand commented 5 years ago

There is an implied assumption today that all we need to run a query concurrently is a CollectorManager implementation. While that is true, there might be some corner cases where a Collector's semantics do not allow it to be concurrently executed (think of ES's aggregates). If a user manages to write a CollectorManager with a Collector that is not really concurrent friendly, we could end up in an undefined state.

This Jira is more of a rhetorical discussion, and to explore if we should allow Collectors to implement an API which simply returns a boolean signifying if a Collector is parallel ready or not. The default would be true, until a Collector explicitly overrides it?

Legacy Jira details

LUCENE-8963 by Atri Sharma (@atris) on Sep 04 2019, updated Sep 05 2019

mikemccand commented 5 years ago

I don't think this would solve any problem? Collectors can only run from a single thread anyway, and all collectors could have a CollectorManager provided that there is a way that the results that they produce can be merged?

[Legacy Jira: Adrien Grand (@jpountz) on Sep 04 2019]

mikemccand commented 5 years ago

Yeah, I agree.

My only gripe is that in case a collector is not really reducible or has some semantic constraints against concurrency, we do not provide any defense against getting into an unknown state.

Maybe it is not an engine problem but more of a user issue – but I wanted to raise this point and see if we have any thoughts about this.

[Legacy Jira: Atri Sharma (@atris) on Sep 04 2019]

mikemccand commented 5 years ago

Do we have examples of collectors in Lucene today that are single-threaded? The core collectors, at least TopFieldCollector and TopDocsCollector seem to be OK since IndexSearcher makes a CollectorManager that uses TopDocs.merge in the end.

So maybe as long as a CollectorManager is available that implies it is thread safe?

[Legacy Jira: Michael McCandless (@mikemccand) on Sep 05 2019]