Closed 107dipan closed 2 years ago
I noticed that the query latency is increasing when I am adding more clauses/ a lot of documents are getting matched(fields.totalcount is greater) . But since I am only getting a certain number of docs in the response payload(10-12) and I am not using any rank profile is there any way I can reduce the query latency?
This is expected scaling behavior. A query that retrieves and matches more documents has higher latency and cost than a query that retrieves less. See Performance sizing guide. Using more threads per search can be used to keep latency in check, but with additional cost as you instead of one thread processing the query matching, you have N.
There are plenty of options to reduce latency and cost, but it's difficult to recommend anything specific as you haven't described the use case you are benchmarking.
There is WAND to accelerate OR like queries or ANN for dense vector retrieval. There is also match phase degrading which can be used to reduce latency and cost if you have a document level signal, see match-phase/graceful degradation.
Hey Jo,
We are a search platform and are currently using a lucene based search engine. We are looking to replace our current search engine and are looking into vespa for the same. We are performing benchmark load tests to check the query latencies for different types of queries.
Apache Lucene has the same scaling properties with regards to the number of total hits which match the query unless you use WAND which has potential sub-linear characteristics, worst case is still N as WAND effectiveness as compared with brute force ranking all documents which match at least one of the query terms, depends highly on the number of query terms and the number of document terms and the score distribution of the terms.
Just wanted to confirm one thing.. The fields.totalCount gives us the total hits/ number of documents matched, right?
Yes, totalCount is the total number of matches, you can compare this with coverage.documents to get the matched fraction. With WAND, ANN, or match phase degradation the totalCount is not accurate.
Will look into the WAND/ ANN concepts. Thanks!
Yes, We are currently checking this. We are increasing the request threads for the queries with greater latencies.
I'm resolving this @107dipan , feel free to re-open if you have further questions on this! Thanks!
Hey Jo, We found from the trace reported the matching and first phase ranking took the most time in our query. Wanted to gain better understanding of what this step is doing. We found this documentation - https://docs.vespa.ai/en/query-api.html It would be very helpful if you can refer some more documentation that helps us to understand better. Thanks!
It's performing matching, finding what documents match the query tree, and then scoring the matched documents (hits) using the ranking profile.
Without knowing your specifics, general performance is dependent on
See https://docs.vespa.ai/en/performance/sizing-search.html and also https://docs.vespa.ai/en/using-wand-with-vespa.html have an overview of what the number of matches does to the overall performance.
If you could be more specific it would be easier to give a better answer.
We are seeing that our query load tests causes cpu spike and we found during those times the bottleneck is at matching and phase 1 ranking. We have 79mil docs where the field set to value bar in 70 mil docs. The query we are making is foo contains bar. The field has indexing type index and attribute. We are using default ranking. We have 18 content nodes, redundancy and searchable copies are currently set to 3 and flat distribution.
Yes, so you are fully ranking 70 million documents since 70 million documents match your query, that is 88% of the collection.
We will consider query degradation, however, we would still like to understand how matching and phase1 ranking occurs. Is it a DMBS type ranking by default?
If you don't override any ranking profile you get default ranking which is purely based on this https://docs.vespa.ai/en/nativerank.html
Describe the bug I am trying to perform benchmark load testing in vespa using different type of queries. I am using Apache JMeter for my load testing. I noticed that the query latency is increasing when I am adding more clauses/ a lot of documents are getting matched(fields.totalcount is greater) . But since I am only getting a certain number of docs in the response payload(10-12) and I am not using any rank profile is there any way I can reduce the query latency? I have tried increasing request thread as suggested but is there anything else I can try?
To Reproduce Steps to reproduce the behavior:
Expected behavior Is there any way to reduce the time taken to add a field to an existing schema.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Infrastructure: Kubernetes Versions [e.g. 22] Vespa version We used vespa:latest tag in the docker image
Additional context Add any other context about the problem here.