milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.01k stars 2.95k forks source link

[Bug]: Scalar queries burn CPU in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl() #22719

Closed akevdmeer closed 1 year ago

akevdmeer commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.3
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): Pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.0
- OS(Ubuntu or CentOS): FCOS
- CPU/Memory: 48 Xeon cores, 768G RAM on host
- GPU: n/a
- Others:

Current Behavior

We're doing many simple scalar queries with expr: field_name in [{field_values}] and find our querynodes hitting their CPU limit in milvus::query::ExecExprVisitor::ExecRangeVisitorImpl() (established using perf)

The collection has > 100M rows and growing. Query performance gets worse as the collection grows. We've added resources to the querynodes repeatedly but are approaching the end of what we can do.

It looks like there is an O(n) cost factor involved, where n is the number of rows in the collection, not the number of rows that satisfy the condition!

Expected Behavior

The cost of a scalar query should not depend on the overall rows in the collection.

Steps To Reproduce

1. Create a collection with many rows
2. Run query with `in` condition and run profiler such as perf.
3. Insert many more rows and repeat, observe degrading performance.

Milvus Log

No response

Anything else?

No response

jiaoew1991 commented 1 year ago

Hi @akevdmeer, thank you for the discovery, This discovery is very important for us to improve performance. Can you upload the perf sampling data?

xiaofan-luan commented 1 year ago

I thought that might be due to we make a termnode for each in condition. Which means if in[1,2,3,4] you will filter 4 times

yanliang567 commented 1 year ago

/assign @xiaofan-luan i think this is what we are trying to improve in query.

/unassign

xiaofan-luan commented 1 year ago

/assign @xiaofan-luan i think this is what we are trying to improve in query.

/unassign

With velox support that should be fixed in 2.4

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan commented 1 year ago

/assign @zhagnlu pls take a look into it

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan commented 1 year ago

keep it and assign to @zhagnlu

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

izapolsk commented 1 year ago

This is quite nasty issue that makes query statement unusable against big collection. Could you please reopen it ?

sre-ci-robot commented 1 year ago

@izapolsk: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/milvus-io/milvus/issues/22719#issuecomment-1609105216): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
xiaofan-luan commented 1 year ago

/reopen

sre-ci-robot commented 1 year ago

@xiaofan-luan: Reopened this issue.

In response to [this](https://github.com/milvus-io/milvus/issues/22719#issuecomment-1609136284): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
xiaofan-luan commented 1 year ago

any updates? @zhagnlu

zhagnlu commented 1 year ago

will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below: image

yanliang567 commented 1 year ago

will using dynamic simd to improve performance for this situation like field_name in [{field_values}], performance improve show as below: image

@zhagnlu could you please list which pr has this improvement?

zhagnlu commented 1 year ago

Haven't complete totally, this test is in UT. I will pull request tomorrow or next week.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 1 year ago

@zhagnlu any updates

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.