quickwit-oss / tantivy-py

Python bindings for Tantivy
MIT License
245 stars 62 forks source link

Expose Tantivy's DisjunctionMaxQuery #244

Closed aecio closed 2 months ago

aecio commented 2 months ago

Added the Query.disjunction_max_query method and corresponding tests.

Expected usage:

query1 = Query.term_query(index.schema, "title", "sea") 
query2 = Query.term_query(index.schema, "title", "mice") 
query = Query.disjunction_max_query([query1, query2], tie_breaker=0.5)

result = index.searcher().search(query, 10)

See also issue #20.

alex-au-922 commented 2 months ago

Seems our function signatures for both in Rust and in Python (BooleanQuery and DistjunctionMaxQuery) are quite different. Would you mind reviewing my commit as well #243 so that we might come up with a more consistent function signature?

aecio commented 2 months ago

It looks like the main difference is the use of PyList vs. Vec for the subqueries parameter. I can try to change it to use Vec instead if you think it's better. I assume PyO3 checks the types automatically in that case, so it's not necessary to check the types manually in the Rust implementation? I'll rebase and try to make these changes tomorrow.

alex-au-922 commented 2 months ago

I have no preference actually, perhaps @cjrh may provide us some feedback? Whether you think PyList or Vec is better for the project. I can submit another PR for the BooleanQuery.

cjrh commented 2 months ago

My preference is to use the Rust types like Vec in general, and take advantage of PyO3's automatic conversion, like in the BooleanQuery PR. Thanks @alex-au-922 for having a look at this PR too.

aecio commented 2 months ago

Hi all, thanks for the review! I just pushed a new commit with the following Python signatures and using Rust's Vec<Query>:

    @staticmethod
    def boolean_query(subqueries: Sequence[tuple[Occur, Query]]) -> Query:
        pass

    @staticmethod
    def disjunction_max_query(subqueries: Sequence[Query], tie_breaker: Optional[float] = None) -> Query:
        pass
    #[staticmethod]
    pub(crate) fn disjunction_max_query(
        subqueries: Vec<Query>,
        tie_breaker: Option<&PyFloat>,
    ) -> PyResult<Query>

Let me know if this addresses the comments.

cjrh commented 2 months ago

I'm fixing the CI issue here: https://github.com/quickwit-oss/tantivy-py/pull/251

When that's merged these tests can rerun.

cjrh commented 2 months ago

I approved the PR, just waiting to get a clean test run before merging.

aecio commented 2 months ago

Thank you all for the reviews and merging.

cjrh commented 2 months ago

@aecio No, thank you for your contribution 👏🏼 🙇🏼