Closed ltung-cit closed 6 years ago
this error happens when the field ("image_embedding" in your case) does not exist in all the documents you are searching on.
Same error. I used the field "embedding_vector", and it exists in my document I'm searching on.
Hi @lior-k The field (image_embedding) also exists in my document.
I have an indice with 10 shards and I realized that when search does return hits, there's a JSON in the response with the property shards
:
{
"successful": 3,
"failed": 7,
"skipped": 0,
"total": 10,
"failures": [
{
"node": "ghr7DWYOSWa4tlvZ4kpsFQ",
"index": "deckito",
"reason": {
"reason": "binaryEmbeddingReader can't be null",
"type": "illegal_state_exception"
},
"shard": 0
}
]
}
When setting shards
to a low number (below 3), the error occurs more often.
I also have the same problem, the document has the field but the problem happens
Please share:
On Fri, Apr 27, 2018, 3:46 PM nabas notifications@github.com wrote:
I also have the same problem, the document has the field but the problem happens
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/6#issuecomment-384959794, or mute the thread https://github.com/notifications/unsubscribe-auth/AExkSDpJAlQSakYzjMkxtg_aOu-Bfhvyks5tsxM_gaJpZM4TlMCk .
Hi @lior-k
This is my mapping:
{
"settings": {
"number_of_shards": 10
},
"mappings": {
"slide": {
"properties": {
"deck_id": {
"type": "keyword",
"index": true
},
"number": {
"type": "integer",
"index": true
},
"image_embedding": {
"type": "binary",
"doc_values": true
},
"text": {
"type": "text",
"index": true
}
}
},
"searchResult": {
"properties": {
"deck_id": {
"type": "keyword",
"index": true
},
"search_timestamp": {
"type": "date",
"index": true
},
}
}
}
}
My query:
{
"query": {
"bool": {
"should": [
{
"function_score": {
"boost": 1,
"score_mode": "avg",
"boost_mode": "multiply",
"min_score": 0,
"script_score": {
"script": {
"source": "binary_vector_score",
"lang": "knn",
"params": {
"cosine": true,
"field": "image_embedding",
"vector": "MY_VECTOR"
}
}
}
}
}
]
}
}
}
MY_VECTOR is something like [0.20438875, 0.087035105, 0.41949105, ...]
I'm using the Python client to search only documents of type slide
, which have the field "image_embedding" in all of them:
result = self.client.search(index='deckito', doc_type='slide', from_=0, size=3, body=query, version=True, _source_include=['deck_id', 'number', 'image_embedding'])
please do the following query in order to check that all the documents have values in this field. meaning this query should return 0 documents:
GET <es-url>/<index>/_search
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc.image_embedding == null || doc.image_embedding.value == null || doc.image_embedding.value == ''",
"lang": "painless"
}
}
}
}
}
}
Hi @lior-k
I am also getting the same error: "{ "took" : 33, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 4, "skipped" : 0, "failed" : 1, "failures" : [ { "shard" : 3, "index" : "indexvectors", "node" : "Q5VeFkIvQh6KLS6PQsUg2w", "reason" : { "type" : "illegal_state_exception", "reason" : "binaryEmbeddingReader can't be null" } } ] }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } "
my data looks like: { "indexvectors" : { "aliases" : { }, "mappings" : { "vectordocs" : { "properties" : { "embedding-vector" : { "type" : "binary", "doc_values" : true }, "id" : { "type" : "text" }, "vector" : { "type" : "text" } } } }, "settings" : { "index" : { "creation_date" : "1524853637835", "number_of_shards" : "5", "number_of_replicas" : "1", "uuid" : "76m277CESNiYnovi6n6Q8A", "version" : { "created" : "5060099" }, "provided_name" : "indexvectors" } } } }
I have just added one record and used the same records vector field in query to get knn with k=1. Ideally the query should have returned the record present in the index but instead I got the above mentioned error. Could you help me out here?
Hi @lior-k
I ran the query you posted in 3 different ways and it returned the following results (note I have 2 document types: slide
and searchResult
and the property image_embedding
is only declared for type slide
):
\<es-url>/\<index>/_search -> 0 documents, which is weird because all documents of type searchResult
don't have the field image_embedding
.
\<es-url>/\<index>/slide/_search -> 0 documents, makes sense because all documents of type slide
have the field image_embedding
populated.
\<es-url>/\<index>/searchResult/_search -> 0 documents, which is weird because all documents of type searchResult
don't have the field image_embedding
.
I was able to get the issue resolved by following lior-k's suggestion and making sure that 0 docs are returned for the query mentioned. I am able to get the KNN docs now using the plugin. Thanks @lior-k :-)
I fixed my templates, and reindexed them, finally it works. Before fixing, I used different field names between templates and documents, but it should be same. And also, I defined the "embbeding_vector" field as "text", but it should be "binary".
good to hear, closing the issue
Also struggling with this problem. The plugin works in production, but when I use elasticdump to copy the data to a local server I start getting "binaryEmbeddingReader can't be null".
elasticdump --input=./account_mapping.json --output=http://localhost:9200/account --type=mapping
elasticdump --input=./account.json --output=http://localhost:9200/account --type=data
In this state my vector searches fail entirely. If I inspect the mapping my field is mapped correctly. If I use the painless query above I find 0 records. If I reindex my document then things start working on most of the shards.
POST http://localhost:9200/_reindex
{
"source": {
"index": "account"
},
"dest": {
"index": "tmp"
}
}
Then I do a second _reindex to rename from tmp back to account. My queries start working now, however - I still see exceptions firing in the ES server and my query _shards has 3 successful and 2 failed shards:
"_shards": {
"total": 5,
"successful": 3,
"skipped": 0,
"failed": 2,
"failures": [
{
"shard": 0,
"index": "account",
"node": "HlfEVuX_TbO8u6GXu47REQ",
"reason": {
"type": "illegal_state_exception",
"reason": "binaryEmbeddingReader can't be null"
}
}
]
},
Update: After about 15 minutes and a few reboots, the two buggy shards started working and I am getting 5/5 successful now. So if anyone else has the same problem - import, reindex and then wait a while while shards rebuild.
I'm using Elasticsearch as docker container with the binary-vector-scoring plugin installed, but I'm getting an intermittent error when doing search with the following query:
The search runs ok for a while (first dozen of requests) and then it starts returning the following error:
Reindexing all documents is the only way to make the search work again, has anybody faced the same problem?