Open GeeWee opened 1 year ago
Tantivy already has single threaded index writer, dunno exact name. Check https://github.com/izihawa/summa There are already implemented bindings for WASM for Tantivy
Hmm, Summa seems to use SingleSegmentIndexWriter which which for some reason doesn't seem to work for my use-case.
Sorry, missed your point about SSIW in the first post. It will be hard without stack trace, but checklist is:
Thanks for your thoughts! I was unable to procure a stacktrace, but after fetching down tantivy
and adding breakpoints everywhere I've managed to figure out the SSIW problem.
My problem was that my IndexSettings
had docstore_compress_dedicated_thread=true
(as is the default) and I had not realized that. After changing that to false, hooray - it works!
Now for the next issue - SSIW doesn't allow adding documents to an existing Index - as it overrides the meta
properties of the index to only contain the segment it writes. This means adding documents and calling finalize
will override any other segments in the index.
However, if I create my own commit
method inside SSIW that looks like the below snippet, then it seems to work and add documents successfully without overriding existing documents.
pub fn commit(self) -> crate::Result<Index> {
let max_doc = self.segment_writer.max_doc();
self.segment_writer.finalize()?;
let segment: Segment = self.segment.with_max_doc(max_doc);
let index = segment.index();
let mut segments = index.searchable_segment_metas()?;
segments.push(segment.meta().clone());
let index_meta = IndexMeta {
index_settings: index.settings().clone(),
segments,
schema: index.schema(),
opstamp: 0,
payload: None,
};
save_metas(&index_meta, index.directory())?;
index.directory().sync_directory()?;
Ok(segment.index().clone())
}
It is essentially the same as the finalize
method except it carries over the meta segments already existing in the index. If you would accept this method in a PR, I would be very happy to provide one, but I'm still new to the internals of tantivy, so I'm not sure I'm "doing it right"
Is your feature request related to a problem? Please describe. Continuing my work on getting tantivy to work server-side with WASM (related issues #1751 #541 ), I would like to index dynamically added documents. In essence I have a large set of documents I can pre-index in a build phase, however each user also has some documents that are loaded dynamically from a database.
I would imagine that normally I could do something like this:
And then search the index. I realize that this might leak documents from one tenant to another, but as this index is rebuilt in-memory on each request and dropped after, this isn't a large concern.
However, as WASM is single-threaded I'm unable to actually get this to work as it seems all the IndexWriters require a thread pool of some sort.
I have tried both
index.writer()
andindex.writer_with_num_threads
with both1
and0
threads. I've even delved into the undocumentedSingleSegmentIndexWriter
Even though it seems to suggest it's only for creating an index with a Single Segment and not adding to an existing index, I figured I would give it a go.However trying to instantiate it gives me the following IoError
Which I think might be threadpool-related, but I am unable to get a stacktrace to confirm.
Describe the solution you'd like I think in essence I'm asking if there's any way to accomplish what I want to do.