vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.69k stars 593 forks source link

Search response returns incorrect number of documents #32424

Closed akolhun closed 5 days ago

akolhun commented 2 weeks ago

Describe the bug Search response returns root.fields.totalCount = X, but in fact lesser number of documents is returned

To Reproduce Steps to reproduce the behavior:

  1. launch vespa cluster with 12 pods (helm chart attached): helm create vespa -n vespa-privatemedia --create-namespace .
  2. load vespa application package (attached)
  3. load attached data.json via vespa-feeder cli tool: vespa-feeder data.json
  4. execute a query as:
    curl 'http://localhost:8080/search/' \
    --header 'Content-Type: application/json' \
    --data '{
    "yql": "select * from mp_private_media where site_id contains '\''c5402062-bedf-4e3e-80ad-d668993ed9b2'\'' and state contains '\''trash'\''",
    "hits": 100,
    "offset": 0
    }'

    Response contains root.fields.totalCount=54, but in fact 38 docs get returned

Expected behavior Response should contain 54 docs, as root.fields.totalCount claims

Environment (please complete the following information):

Vespa version 8.408.12

Additional context Note: the problem varies based on the number of nodes defined in content cluster. Looks like it's a distribution key releated issue

vap_privatemedia.zip vespa_privatemedia_helm.zip data.json.zip

hmusum commented 1 week ago

This could be due to timeout, see the doc on timeout and the further documentation this points to, e.g. soft timeout

See also documentation about summaries, especially the section on performance

An actual query response could also be helpful. In that case please include "trace.level": 4 in the query

hmusum commented 5 days ago

Noe feedback, closing