Query latency increasing with number of hits even if no ranking profile or order clause is used

107dipan commented 2 years ago

Describe the bug I am trying to perform benchmark load testing in vespa using different type of queries. I am using Apache JMeter for my load testing. I noticed that the query latency is increasing when I am adding more clauses/ a lot of documents are getting matched(fields.totalcount is greater) . But since I am only getting a certain number of docs in the response payload(10-12) and I am not using any rank profile is there any way I can reduce the query latency? I have tried increasing request thread as suggested but is there anything else I can try?

To Reproduce Steps to reproduce the behavior:

YQL query matching a lot of documents

Expected behavior Is there any way to reduce the time taken to add a field to an existing schema.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Infrastructure: Kubernetes Versions [e.g. 22] Vespa version We used vespa:latest tag in the docker image

Additional context Add any other context about the problem here.

jobergum commented 2 years ago

I noticed that the query latency is increasing when I am adding more clauses/ a lot of documents are getting matched(fields.totalcount is greater) . But since I am only getting a certain number of docs in the response payload(10-12) and I am not using any rank profile is there any way I can reduce the query latency?

This is expected scaling behavior. A query that retrieves and matches more documents has higher latency and cost than a query that retrieves less. See Performance sizing guide. Using more threads per search can be used to keep latency in check, but with additional cost as you instead of one thread processing the query matching, you have N.

There are plenty of options to reduce latency and cost, but it's difficult to recommend anything specific as you haven't described the use case you are benchmarking.

There is WAND to accelerate OR like queries or ANN for dense vector retrieval. There is also match phase degrading which can be used to reduce latency and cost if you have a document level signal, see match-phase/graceful degradation.

107dipan commented 2 years ago

Hey Jo,

We are a search platform and are currently using a lucene based search engine. We are looking to replace our current search engine and are looking into vespa for the same. We are performing benchmark load tests to check the query latencies for different types of queries.

jobergum commented 2 years ago

Apache Lucene has the same scaling properties with regards to the number of total hits which match the query unless you use WAND which has potential sub-linear characteristics, worst case is still N as WAND effectiveness as compared with brute force ranking all documents which match at least one of the query terms, depends highly on the number of query terms and the number of document terms and the score distribution of the terms.

107dipan commented 2 years ago

Just wanted to confirm one thing.. The fields.totalCount gives us the total hits/ number of documents matched, right?

jobergum commented 2 years ago

Yes, totalCount is the total number of matches, you can compare this with coverage.documents to get the matched fraction. With WAND, ANN, or match phase degradation the totalCount is not accurate.

107dipan commented 2 years ago

Will look into the WAND/ ANN concepts. Thanks!

107dipan commented 2 years ago

Yes, We are currently checking this. We are increasing the request threads for the queries with greater latencies.

jobergum commented 2 years ago

I'm resolving this @107dipan , feel free to re-open if you have further questions on this! Thanks!

107dipan commented 2 years ago

Hey Jo, We found from the trace reported the matching and first phase ranking took the most time in our query. Wanted to gain better understanding of what this step is doing. We found this documentation - https://docs.vespa.ai/en/query-api.html It would be very helpful if you can refer some more documentation that helps us to understand better. Thanks!

jobergum commented 2 years ago

It's performing matching, finding what documents match the query tree, and then scoring the matched documents (hits) using the ranking profile.

Without knowing your specifics, general performance is dependent on

The query tree specification - matching over attributes, indexes, how many hits the query produces. Using OR instead of AND, things that generally expose more documents to full rank scoring.
The complexity of the ranking expression

See https://docs.vespa.ai/en/performance/sizing-search.html and also https://docs.vespa.ai/en/using-wand-with-vespa.html have an overview of what the number of matches does to the overall performance.

If you could be more specific it would be easier to give a better answer.

107dipan commented 2 years ago

We are seeing that our query load tests causes cpu spike and we found during those times the bottleneck is at matching and phase 1 ranking. We have 79mil docs where the field set to value bar in 70 mil docs. The query we are making is foo contains bar. The field has indexing type index and attribute. We are using default ranking. We have 18 content nodes, redundancy and searchable copies are currently set to 3 and flat distribution.

jobergum commented 2 years ago

Yes, so you are fully ranking 70 million documents since 70 million documents match your query, that is 88% of the collection.

107dipan commented 2 years ago

We will consider query degradation, however, we would still like to understand how matching and phase1 ranking occurs. Is it a DMBS type ranking by default?

jobergum commented 2 years ago

If you don't override any ranking profile you get default ranking which is purely based on this https://docs.vespa.ai/en/nativerank.html

vespa-engine / vespa

Query latency increasing with number of hits even if no ranking profile or order clause is used #21081