quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
11.8k stars 655 forks source link

Addition of a migration guide from Lucene to Tantivy #2377

Open beneyal opened 4 months ago

beneyal commented 4 months ago

Hello,

As a side-project for work, I would like to port our current application, which is written in Scala and uses Lucene 9.5, to Rust and Tantivy. However, I don't know if all the features I need from Lucene exist in Tantivy. It would be great to have a side-by-side migration guide from Lucene to Tantivy specifying how to "map" different Lucene constructs to Tantivy and which ones are not (yet?) available.

For example, we're using SpanQuerys in Lucene, and neither could I find an equivalent construct in Tantivy nor perhaps a tutorial on how to write something similar using existing Tantivy constructs. I believe knowing what isn't available is just as important as knowing what is 😃

Thanks for reading, and I look forward to seeing how this project evolves! 🚀

PSeitz commented 4 months ago

What kind of SpanQuery do you use?

We support phrase query with slop, but there is no such low level building block for spans. https://docs.rs/tantivy/latest/tantivy/query/struct.PhraseQuery.html#method.set_slop

beneyal commented 4 months ago

Many of them 😅

Just some I see in the code are: SpanNearQuery, SpanTermQuery, SpanMultiTermQueryWrapper, SpanContainingQuery, and SpanOrQuery. And there's another one that is a custom implementation inheriting SpanQuery. But it's not just SpanQuery, that was just an example, we have other custom stuff inheriting Lucene classes, which is why I said that a mapping could be helpful. We plan to open source that project soon (🤞) and then I'll be able to show precisely what we're doing there.