Open JohannesLichtenberger opened 1 year ago
I can pitch in here.
We have to check, if we can somehow implement some kind of a store (I think it's called Directory) and the fields, as our main data structure is a keyed trie indexing 64 bit nodeKeys <=> nodes and it would be great if we could store the full text index likewise in our persistent structure. Haven't checked Lucene, though.
We make use of Lucene in eXist-db for the Full Text index. There are definitely advantages and disadvantages to using Lucene.
On the one hand Lucene is very mature and flexible whilst offering decent performance. If you want to implement something like the W3C XQuery Full Text extensions, it will have almost everything you need baked in. Also, you can allow users to choose or code their own Analyzers for pretty much any language or purpose which is neat.
On the other hand, if you need transactional consistency, as far as I am aware there is no good way to involve Lucene in the transactions against your own indexes. I enquired some time ago, so perhaps things have changed more recently, but previously there was no way to control Lucene transactions directly, so you could not do a 2PC approach.
Hi Adam, isn't the single writer supposed to implement the two phase commit interface https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/TwoPhaseCommit.html ?
I had a quick look, and I think we'd need to implement a custom Directory
... but I'm not sure if we can somehow store the Document
s in another subtree (in a trie) as we do with the other indexes. Thus, it would be automatically versioned which is what we need after all. AFAICS, the documents are written in DocumentsWriter
, which is sadly not an interface and also instances are created directly in IndexWriter
. Thus, I'm not sure if it's even possible to change the index structure in which lucene stores the documents besides the actual Directory to store to/read from!?
We need to have a way to do fulltext search on text nodes. Probably therefore it's possible to include Lucene.