As of 2.16 version of k-NN plugin a vector search query not only does vector search on a native engine index, but does more than that.
Example 1
In Efficient filtering we first run the filters, converts the filters iterators to bitsets and then do either exact search or ANN Search based on certain condition.
Example 2
With new Disk Based Vector Search feature we will be doing 2 phased search first on oversampled k and then rescoring of those top k with full precision vectors.
In both examples we are not just doing vector search but much more and as of today we have no way to know what is the latency of these internal operations. I do agree there is a profile API that gives the breakdown that API is doesn't track the above mentioned granular operations of a query. Another thing is profile api is a point in time latency and most of the time users are interested in query latency stats over time and also its sub operations.
Solution
I can think of below solutions:
Improving the profile api results for vector search query to include these sub operations. We can take some inspiration of how Bool query/disMax query does it.
We should look into QueryInsights plugin and see how we can add these sub operations stats via that plugin rather than emitting them via cluster stats.
I don't think the above 2 solutions will be enough but I see that as a start and may be we need to add integrations at few more places to be really have a good mechanism for query stats.
Description
As of 2.16 version of k-NN plugin a vector search query not only does vector search on a native engine index, but does more than that.
Example 1
In Efficient filtering we first run the filters, converts the filters iterators to bitsets and then do either exact search or ANN Search based on certain condition.
Example 2
With new Disk Based Vector Search feature we will be doing 2 phased search first on oversampled k and then rescoring of those top k with full precision vectors.
In both examples we are not just doing vector search but much more and as of today we have no way to know what is the latency of these internal operations. I do agree there is a profile API that gives the breakdown that API is doesn't track the above mentioned granular operations of a query. Another thing is profile api is a point in time latency and most of the time users are interested in query latency stats over time and also its sub operations.
Solution
I can think of below solutions:
I don't think the above 2 solutions will be enough but I see that as a start and may be we need to add integrations at few more places to be really have a good mechanism for query stats.