Closed akevdmeer closed 1 year ago
Hi @akevdmeer, thank you for the discovery, This discovery is very important for us to improve performance. Can you upload the perf sampling data?
I thought that might be due to we make a termnode for each in condition. Which means if in[1,2,3,4] you will filter 4 times
/assign @xiaofan-luan i think this is what we are trying to improve in query.
/unassign
/assign @xiaofan-luan i think this is what we are trying to improve in query.
/unassign
With velox support that should be fixed in 2.4
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/assign @zhagnlu pls take a look into it
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
keep it and assign to @zhagnlu
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This is quite nasty issue that makes query
statement unusable against big collection.
Could you please reopen it ?
@izapolsk: You can't reopen an issue/PR unless you authored it or you are a collaborator.
/reopen
@xiaofan-luan: Reopened this issue.
any updates? @zhagnlu
will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below:
will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below:
@zhagnlu could you please list which pr has this improvement?
Haven't complete totally, this test is in UT. I will pull request tomorrow or next week.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
@zhagnlu any updates
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
We're doing many simple scalar queries with
expr: field_name in [{field_values}]
and find our querynodes hitting their CPU limit inmilvus::query::ExecExprVisitor::ExecRangeVisitorImpl()
(established usingperf
)The collection has > 100M rows and growing. Query performance gets worse as the collection grows. We've added resources to the querynodes repeatedly but are approaching the end of what we can do.
It looks like there is an O(n) cost factor involved, where n is the number of rows in the collection, not the number of rows that satisfy the condition!
Expected Behavior
The cost of a scalar query should not depend on the overall rows in the collection.
Steps To Reproduce
Milvus Log
No response
Anything else?
No response