torvalds-dev / solr

Apache Solr open-source search software
https://solr.apache.org/
Apache License 2.0
0 stars 0 forks source link

solr.DenseVectorField does not work on a collection with more than 1 shards #29

Open 0xDTE opened 1 year ago

0xDTE commented 1 year ago

the field type solr.DenseVectorField introduced in solr 9.0 works well with single core user managed and single shard collection in solr cloud way, but in any multi shard collection it stores data as list of strings instead of list of floats and running query on top of it also fails to provide correct results.

attaching screenshot, the field name is vector, which is defined as

The field defined as solr.DenseVectorField is expected to store an array of floats, right? And this worked right in case of a single shard collection meaning it indeed stored and returned array of floats. But in a multishard collection, the field returns back an array of strings instead of array of floats. I have attached a screenshot previously where the field name vector has value as array of strings instead of array of floats when original data that was indexed had list of floats, but If I create a collection with only one shard and index same data, it is correctly stored as array of floats. This causes the query on the field in multishard also return incorrect results.

torvalds-dev commented 1 year ago

torvalds.dev is analyzing the ticket