Open xiaofan-luan opened 1 year ago
Hi, thanks for your interest! These are very good questions, but not very easy to answer. Let me do my best.
I've been wanting to release a stable version for a while now, but this is not a trivial effort. There are several reasons for that. For one, the current API is not in a state I would like it to be in, and it given that I'm planning some improvement in that regard, trying to make it stable would be counterproductive. Of course, we could just release a stable version as is, and work on 2.0. But then we arrive at another problem: I have no confidence that I (or any other contributor) would have enough spare time to keep development going and try to maintain the stable version in the current state.
Unfortunately, we are a very small bunch of contributors, and none of us works on this full-time, or even half-time; I try to keep it going in my spare time, because I see some interest in using it, but unless we develop a larger community, and more people get involved in development, the progress will remain slow.
There's a lot of great code here, but from the beginning it wasn't designed to be a commercial product, but an academic tool, and that's why flexibility and performance were always priority, but not necessarily ease of use. I would like to change that so that we have both; but there's much work to be done there, and that's why it's kept in this pre-release state, because lots of things are changing quickly. I've been thinking of maybe splitting the project to a separate library/libraries and then another one to work on top of that to provide more of an out-of-the-box experience, but I haven't really thought this through yet.
So the bottom line is, as things stand, there's absolutely nothing I could promise.
When it comes to supporting the queries that Lucene does, you are correct. We currently support only simple conjunctive and disjunctive queries. Some of the queries offered by Lucene and Tantivy, such as, say, a AND b AND (c OR d)
and similar would probably not be very difficult to implement, though I have yet to look into it more closely. A bigger problem would be any queries using a positional index, because we don't currently have a positional index. But it's doable.
There's not much in terms of technical documents. The docs provide some high-level information as to different components work together, CLI usage, etc.: https://pisa.readthedocs.io/en/latest/
I could also suggest having a look at the Github Issues if you're interested in the status of certain initiatives.
But most importantly, I encourage you to join our slack and ask any specific questions you might have; I'll try to answer the best I can.
Of course, any help or contribution is very welcome, regardless of whether you decide to use PISA or not :) Feel free to let me know here or on slack if you have more questions, or if you would want to talk about our plans, your plans, and anything in between. I'd love to find out more about your inverted index initiative.
Great thanks @elshize for the detailed response. I will do some investigation in next 1-2 months, if it meets our requirement maybe we can help on pull some SOTA implementation out and integrate into a production ready package people who want to put PISA into their real world use cases
Describe the solution you'd like A clear and concise description of what you want to happen.
Hi Team, I'm actually from milvus community (https://github.com/milvus-io/milvus), an opensource vector database. I saw pisa by chance and I think it would be very helpful if milvus could support inverted index and serve hybrid query together with dense vector search.
Have a few question before I start my evaluation:
Thanks for the team, you are building a great product.
Additional context Add any other context about the feature request here.