quickwit-oss / tantivy-py

Python bindings for Tantivy
MIT License
245 stars 62 forks source link

Get a list of indexed documents. #216

Closed amrit073 closed 3 months ago

amrit073 commented 3 months ago

Hi! Is there a plan to add the feature to list all documents that are indexed?

It would be great to have to implement incremental indexing of new documents.
We can check if the document is already present in the index before passing the contents for indexing.

adamreichold commented 3 months ago

You should be able to search using the AllQuery or more efficiently access Searcher::segment_readers and use SegementReader::alive_bitset to just iterate all available document ID.

cjrh commented 3 months ago

@amrit073 If that answers your question feel free to close the issue 😄

amrit073 commented 3 months ago

Thanks! Will try to dive into this, maybe try to expose its python bindings.

adamreichold commented 3 months ago

I am sorry for not noticing that this was in the tantivy-py instead of tantivy repository. But do not that you can already access the all-query by just parsing * as the query. But yes, segment readers and alive bitsets and not available from Python.