quantleaf / probly-search

A lightweight full-text search library written in Rust that provides full control over the scoring calculations
MIT License
60 stars 5 forks source link

Question about removed_docs argument to query #10

Closed tmpfs closed 2 years ago

tmpfs commented 2 years ago

Thanks for the library, I spent quite a bit of time researching the available libraries for my project and this one strikes the perfect balance for my needs, in particular the support for webassembly is critical!

It seems that if I vacuum the index then there is no need to pass removed_docs as the last argument to query?

Am i right in thinking that the removed_docs argument exists to support the case when documents have been removed but not yet vacuumed from the index and the query should ignore them?

marcus-pousette commented 2 years ago

Thanks for your question! Great that you find it useful!

You have understood it correctly. You only need to pass it if you have not vacuumed the index. The reason for this is that vacuum is a scanning operation, while passing the removed_docs HashSet acts like a filter. An easy condition for vacuuming could be to do it when the removed_docs set has grown too large. Or do it every X minute (depends on your use case)

tmpfs commented 2 years ago

Thanks for the prompt reply, much appreciated 🙏