Closed danyilq closed 3 months ago
The iteration that it fails on is also consistent no matter of what vector is used, but dimensionality of vector amplifies which iteration it fails. With: 384 dimensions - 145th iteration. 512 dimensions - 109th iteration. 121 dimensions - 69th iteration. 728 dimensions - 109th iteration 1024 dimensions - 82nd iteration.
Does forcemerging after each deletion help?
Unfortunately forcemerging didn't help
Moved this to the k-nn repo.
Index mapping that was used
{
"settings": {
"index": {
"knn": True,
"knn.algo_param.ef_search": 100,
"refresh_interval": "1s",
"store.hybrid.mmap.extensions": [
"nvd", "dvd", "tim", "tip", "dim", "kdd", "kdi", "cfs", "doc", "vec", "vex"
]
},
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_meta": {
"media_type": "text",
"index_settings": {
"index_defaults": {
"treat_urls_and_pointers_as_images": False,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": True,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": None
},
"ann_parameters": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
},
"number_of_shards": 1,
"number_of_replicas": 0
},
"model": "hf/all_datasets_v4_MiniLM-L6"
},
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"__chunks": {
"type": "nested",
"properties": {
"__field_name": {
"type": "keyword"
},
"__field_content": {
"type": "text"
},
"__vector_marqo_knn_field": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "lucene",
"parameters": {
"ef_construction": 128,
"m": 16
}
}
}
}
}
}
}
}
@danyilq can you add the details on the number of nodes, RAM of the nodes too, to help us better understand the issue.
Describe the bug When performing Knn search queries on an index multiple times, with documents being deleted and inserted, the search occasionally does not return any hits.
To Reproduce
Expected behavior The search query should consistently return hits as long as there are documents in the index.
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context Python script that reproduces issue: