lior-k / fast-elasticsearch-vector-scoring

Score documents using embedding-vectors dot-product or cosine-similarity with ES Lucene engine
Apache License 2.0
395 stars 112 forks source link

query size is ignored #31

Closed adidier17 closed 5 years ago

adidier17 commented 5 years ago

Thank you for the great plugin! I'm seeing some unexpected behavior where no matter what I set the query size to, the entire size of the index is returned in the hits. Using the python, my query is like so:

def img_q(vector):
    query = { 
                  "query": {
                    "function_score": 
                      {"boost_mode": "replace",
                      "script_score": {
                        "script": {
                          "source": "elasticsearch-binary-vector-scoring",
                          "lang": "knn",
                         "params": {
                            "cosine": True,
                            "field": "feature",
                            "vector":  vector

                          }
                        }
                      }
                    }
                  },
                  "from": 0,
                  "size": 100
                }
    return query

res = es.search(index='maars', body=img_q(myvector))

And then res['hits']['total'] is always the number of vectors in the index. Any idea why that could be happening? Thank you!

lior-k commented 5 years ago

Glad to hear it :-)

I think this issue is related to this other issue: https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/25 Try following up on the details there, and let me know if it works.

On Mon, Sep 9, 2019, 7:42 AM Annie Didier notifications@github.com wrote:

Thank you for the great plugin! I'm seeing some unexpected behavior where no matter what I set the query size to, the entire size of the index is returned in the hits. Using the python, my query is like so: ` def img_q(vector): query = { "query": { "function_score": {"boost_mode": "replace", "script_score": { "script": { "source": "elasticsearch-binary-vector-scoring", "lang": "knn", "params": { "cosine": True, "field": "feature", "vector": vector

                  }
                }
              }
            }
          },
          "from": 0,
          "size": 100
        }

return query

res = es.search(index='maars', body=img_q(myvector)) ` And then res['hits']['total'] is always the number of vectors in the index. Any idea why that could be happening? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/31?email_source=notifications&email_token=ABGGISA7I4S3HSSTEAYHCL3QIXH4VA5CNFSM4IUWP2C2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKCECUA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGGISGDUHSQDCFWW4N74ZDQIXH4VANCNFSM4IUWP2CQ .

adidier17 commented 5 years ago

Thanks!