whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
240 stars 36 forks source link

Is AsyncWriter reliable to use? #578

Open davidshen84 opened 1 year ago

davidshen84 commented 1 year ago

Hi,

I indexed the same set of documents using both BufferedWriter and AsyncWriter, and I found the search results from AsyncWriter are very poor if not incorrect.

My code for using AsyncWriter indexer looks like this.


def add_document(data: Dict[str, str]) -> None:
    with AsyncWriter(shared_ix) as writer:
        writer.add_document(id=str(data['id']), path=data['path'], content=data[content])
        logger.info('added %s', data['path'])

def init_pool(ix: IndexWriter):
    global shared_ix
    shared_ix = ix

# ...define schema...
ix = create_in(index_dir, schema)
with Pool(initializer=init_pool, initargs=(ix,)) as pool:
    pool.map(add_document, doc_set_list)

There's no error/warning during indexing with the AsyncWriter, but the resulting index folder is about 8 MB smaller than the one indexed using the BufferredWriter.

I understand the document said it is a sample implementation. How is it good for local development and evaluation?

Thanks