Open liliu-z opened 1 year ago
/assign @liliu-z
/unassign @xiaofan-luan
Hi, any progress made? Our company came to the same needs as mentioned in https://github.com/milvus-io/milvus/discussions/23045
@BennySemyonovAB
can you explain what is the use case for custimized distance a little bit? Do we know this distance before we build index?
Hello there @xiaofan-luan ,
I currently face the same problem. The use case of such a thing would be models like SigLip where the similarity function has been trained as part of the model.
Two vectors X and Y are close when sigmoid(dot(X,Y)* t + b)
is close to 1. I tried using the Inner Product metric but the results are quite bad.
For my part, the distance function is known before I build index. I typically would see something like this :
def custom_metric(x, y):
return sigmoid(np.dot(x, y)*t + b)
index_params.add_index(
field_name="vector",
index_type="HNSW",
metric_type=custom_metric
)
I don't know to what extent this is compatible with the Milvus code but i would be lifesaving.
Thanks in advance
this seems to be not hard. but the hard part is how to do user define.
Hello there @xiaofan-luan ,
I currently face the same problem. The use case of such a thing would be models like SigLip where the similarity function has been trained as part of the model.
Two vectors X and Y are close when
sigmoid(dot(X,Y)* t + b)
is close to 1. I tried using the Inner Product metric but the results are quite bad.For my part, the distance function is known before I build index. I typically would see something like this :
def custom_metric(x, y): return sigmoid(np.dot(x, y)*t + b) index_params.add_index( field_name="vector", index_type="HNSW", metric_type=custom_metric )
I don't know to what extent this is compatible with the Milvus code but i would be lifesaving.
Thanks in advance
Hi @RaphaelCanin
Thanks for this info. This is always in our roadmap, and it will great to listen more from the community to help us understand this necessity better.
A quick question. In your example, the relationship between IP and this customized metric is monotonic, which means the distance comparison result of IP and this customized metric will always be the same. Can I ask why this will help on this use case?
Hi @xiaofan-luan, hi @liliu-z ,
Thanks for taking some time to help me. @liliu-z you are right, I have implemented a workaround as the distance comparison is monotonic. But, it is true because t > 0, which is only true for this particular one. With t < 0, the only workaround is to use the Furthest Neighbor Search instead, which is currently not available as far as I know.
For my current use case, I can use Milvus as is for the moment. However, when I need to search for a particular range of my custom score (eg. : between 50% and 60%), I need to compute the reciprocal of the above mentionned function (same eg. : between 0.1070889177 and 0.11051817).
It is not hard, but it makes the code unnecessarily long.
So, I am able to reach my goals with the current Milvus' features, but it would be simpler with an UDF Metric distance.
Thanks a lot
Hi @xiaofan-luan, hi @liliu-z ,
Thanks for taking some time to help me. @liliu-z you are right, I have implemented a workaround as the distance comparison is monotonic. But, it is true because t > 0, which is only true for this particular one. With t < 0, the only workaround is to use the Furthest Neighbor Search instead, which is currently not available as far as I know.
For my current use case, I can use Milvus as is for the moment. However, when I need to search for a particular range of my custom score (eg. : between 50% and 60%), I need to compute the reciprocal of the above mentionned function (same eg. : between 0.1070889177 and 0.11051817).
It is not hard, but it makes the code unnecessarily long.
So, I am able to reach my goals with the current Milvus' features, but it would be simpler with an UDF Metric distance.
Thanks a lot
Yes, this can help make the range related operation easier.
For t < 0, not sure whether we can implement it though using -IP as metrics type.
Anyways, thanks for this use case sharing! We are keeping looking for more cases to help us define this UDF feature better!
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
See https://github.com/milvus-io/milvus/discussions/23045
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response