I indexed the same set of documents using both BufferedWriter and AsyncWriter, and I found the search results from AsyncWriter are very poor if not incorrect.
My code for using AsyncWriter indexer looks like this.
def add_document(data: Dict[str, str]) -> None:
with AsyncWriter(shared_ix) as writer:
writer.add_document(id=str(data['id']), path=data['path'], content=data[content])
logger.info('added %s', data['path'])
def init_pool(ix: IndexWriter):
global shared_ix
shared_ix = ix
# ...define schema...
ix = create_in(index_dir, schema)
with Pool(initializer=init_pool, initargs=(ix,)) as pool:
pool.map(add_document, doc_set_list)
There's no error/warning during indexing with the AsyncWriter, but the resulting index folder is about 8 MB smaller than the one indexed using the BufferredWriter.
I understand the document said it is a sample implementation. How is it good for local development and evaluation?
Hi,
I indexed the same set of documents using both
BufferedWriter
andAsyncWriter
, and I found the search results fromAsyncWriter
are very poor if not incorrect.My code for using
AsyncWriter
indexer looks like this.There's no error/warning during indexing with the
AsyncWriter
, but the resulting index folder is about 8 MB smaller than the one indexed using theBufferredWriter
.I understand the document said it is a sample implementation. How is it good for local development and evaluation?
Thanks