Closed OverlordRon closed 1 year ago
Is there an existing issue for this?
- [x] I have searched the existing issues
Environment
- Milvus version: 2.3.1 - Deployment mode(standalone or cluster): standalone - MQ type(rocksmq, pulsar or kafka): - SDK version(e.g. pymilvus v2.0.0rc2): - OS(Ubuntu or CentOS): Ubuntu 20.04 - CPU/Memory: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz / 128 GB Memory - GPU: - Others:
Current Behavior
When I upsert a collection of vectors into the db, and I try to search for one of the vectors I just upserted, the search results do not find the exact vector I am looking for.
Prepare index params
index_params = { "metric_type":"L2", # cos(ip)Euclidean distance "index_type":"FLAT", # for Floating point vectors "params":{"nlist":1024}, # parameters specific to index # "nlist" IVF_FLAT divides vector data into nlist cluster units }
prepare search params
search_params = { "metric_type": "L2", # Euclidean distance, ARvind uses Inner Product (may require normalization of vectors) "offset": 10, # Retrieve 20 closest vectors (+/- 5) "ignore_growing": False, "params": {"nprobe": 10}, # number of cluster units to search, must be < nlist }
set search vector as a single vector example from the upserted data
vec = data[:][9][500]
search
results = collection.search( data=[vec], anns_field="vector", # name of the field to search on # the sum of
offset
inparam
andlimit
# should be less than 16384. param=search_params, limit=2, expr=None, # set the names of the fields you want to # retrieve from the search result. output_fields=['company_name','plaintext','vector'], #consistency_level="Strong" )print(results[0].ids)
Output: [479, 571]
The output should contain [500] because that is the exact vector being searched.
Expected Behavior
The expected results[0].ids should contain [500], but it does not. Vector 500 was the exact vector being searched for in the DB.
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
you can try to increase nprobe to 64 see if it works
search_params = { "metric_type": "L2", # Euclidean distance, ARvind uses Inner Product (may require normalization of vectors) "offset": 10, # Retrieve 20 closest vectors (+/- 5) "ignore_growing": False, "params": {"nprobe": 10}, # number of cluster units to search, must be < nlist }
why you specify offset to be 10
by setting offset to be 10, we skip the top 10 most similar vectors
Yes! That is the reason. Thank you @xiaofan-luan . That was a big help
Is there an existing issue for this?
Environment
Current Behavior
When I upsert a collection of vectors into the db, and I try to search for one of the vectors I just upserted, the search results do not find the exact vector I am looking for.
Prepare index params
index_params = { "metric_type":"L2", # cos(ip)Euclidean distance "index_type":"FLAT", # for Floating point vectors "params":{"nlist":1024}, # parameters specific to index
"nlist" IVF_FLAT divides vector data into nlist cluster units
}
prepare search params
search_params = { "metric_type": "L2", # Euclidean distance, ARvind uses Inner Product (may require normalization of vectors) "offset": 10, # Retrieve 20 closest vectors (+/- 5) "ignore_growing": False, "params": {"nprobe": 10}, # number of cluster units to search, must be < nlist }
set search vector as a single vector example from the upserted data
vec = data[:][9][500]
search
results = collection.search( data=[vec], anns_field="vector", # name of the field to search on
the sum of
offset
inparam
andlimit
)
print(results[0].ids)
Output: [479, 571]
The output should contain [500] because that is the exact vector being searched.
Expected Behavior
The expected results[0].ids should contain [500], but it does not. Vector 500 was the exact vector being searched for in the DB.
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response