Open bart2001 opened 3 years ago
There is no need to use refresh=true in production. This plugin does not require that. Refresh=true just means the inserted doc is quariable immediately after insertion, which is a must for the test. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
Hello, thank you for your reply.
The issue can be specified like this.
- Case 1. insert single data without `refresh=true` => vector scoring fail (cosine similarity)
- Case 2. insert singe data with `refresh=true` => vector scoring success (cosine similarity)
- Case 3. bulk insert with `refresh=true` => vector scoring fail (cosine similarity)
vector scoring fail
means there are same similarity scores among result documents, which does not guarantee correct ordering of result vectorsIn summary, I think the problem is refresh=true
is required to get "proper vector scoring result"...
This issue may be connected to compatibility with elasitcsearch core since the above cases all worked as vector scoring success
on elasticsearch 7.5.2 version.
right, thanks for clarifying. I was just able to recreate it in a test
I have no clue at the moment as of why this unexpected change in behaviour in this ES version.
for the meanwhile we need to use refresh=true
in doc insertions which is not ideal
Hello, I would like to raise an issue about the test case.
According to
fast-elasticsearch-vector-scoring/src/test/java/com/liorkn/elasticsearch/PluginTest.java
file, the test case usesparams.put("refresh", "true")
on data insertion request, which makes vector scoring result works properly.However, in production case, we often do not use
refresh=true
option on data insertion. If we do not use this option, the vector scoring result does not work properly. (such as there are same cosine similarity score among result documents...)I think one of two options should be considered
refresh=true
) -> this may cause code level modification of vector scoring pluginrefresh=true
option must be provided on data insertion