Closed magloven closed 13 years ago
Thanks for the feedback! The good news is the v2 of Lucandra, which is about to be pushed, handles these issues. Specifically MatchAllDocs delete.
Are you opposed to using solr?
Great! Looking forward to a v2 of Lucandra.
We're migrating our portal application to use Cassandra as backend and we've been using Lucene ever since it was created. Haven't really looked that much at Solr since Lucene will do the job and we have implemented all functions and UIs we need.
Will for sure check out Solr sometime in the future.
========= SHORT =========
Problems with the Lucandra IndexWriter:
Request for improvements:
========= LONG =========
We have an increasing number of indexes that contains lots of small documents. most of the fields contain arbitrary/unknown values. Some contain a known set of values. The documents are based on data stored in Cassandra.
Sometimes an index must be "synced" with the data currently stored in Cassandra. Just update/re-index the index using the data currently stored in Cassandra will just not do. Sure, lots of documents will be better "up to date" but the index will still contain obsolete/dirty data (data that no longer exist in Cassandra).
The preferred solution in most of our cases are to completely clear the index from all documents and then re-index it using the data currently stored in Cassandra. Lucene provides at least two ways to delete all documents from an index using a Lucene IndexWriter:
None of them are supported when using the Lucandra IndexWriter.
To delete all documents using Lucandra, we first presumed it could be done like this:
But - we found out that this will do only if the index contains at most 1000 documents. The deleteDocuments(Query) method executes a search to find all documents to be removed (IndexWriter#271) and the search result will at most contain 1000 hits.
To delete all documents "for real" using Lucandra we have to:
In our opinion - the Lucandra indexWriter (and Lucandra IndexSearcher) have some important issues that need to be handled. See SHORT above for suggested improvements.