Open filipecosta90 opened 1 year ago
errors on non-local setups
Are you testing Ziliz's cloud offering?
Hello, are you running the test on zilliz cloud? Can you provide the instance specifications you used?
Hello, are you running the test on zilliz cloud?
@wangting0128 yes.
Can you provide the instance specifications you used?
Sure. I've used the Dedicated Performance Optimized CU size 1 (issue happens on large CUs as well). I've confirmed yesterday it still happens:
MILVUS_USER="db_admin" MILVUS_PASS="<...>" MILVUS_PORT=<...> python3 run.py --engines milvus-m-* --datasets gist-960-euclidean --host <...>
(...)
(...)
Running experiment: milvus-m-16-ef-64 - gist-960-euclidean
established connection
/home/ubuntu/vector-db-benchmark/datasets/gist-960-euclidean/gist-960-euclidean.hdf5 already exists
Experiment stage: Configure
Experiment stage: Upload
644800it [09:51, 1120.07it/s][batch_insert] retry:8, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, Broken pipe>
649664it [09:55, 1204.76it/s][batch_insert] retry:9, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, Broken pipe>
1000000it [15:16, 1090.80it/s]
Upload time: 919.8683542869985
Total import time: 1126.062087302911
Experiment stage: Search
(...)
Notice that after around 10minutes of ingestion zilliz cloud "breaks" and we need 8 and 9 retries to complete that batch insert. I've preserved the full log of all variations in case we need it for the future.
@wangting0128 notice that I've added a backoff strategy capacity to the tool to ensure we can properly handle this issues and benchmark with the correct conditions. I'll open a PR just for the zilliz cloud benchmarking still today.
Hello, are you running the test on zilliz cloud?
@wangting0128 yes.
Can you provide the instance specifications you used?
Sure. I've used the Dedicated Performance Optimized CU size 1 (issue happens on large CUs as well). I've confirmed yesterday it still happens:
MILVUS_USER="db_admin" MILVUS_PASS="<...>" MILVUS_PORT=<...> python3 run.py --engines milvus-m-* --datasets gist-960-euclidean --host <...> (...) (...) Running experiment: milvus-m-16-ef-64 - gist-960-euclidean established connection /home/ubuntu/vector-db-benchmark/datasets/gist-960-euclidean/gist-960-euclidean.hdf5 already exists Experiment stage: Configure Experiment stage: Upload 644800it [09:51, 1120.07it/s][batch_insert] retry:8, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, Broken pipe> 649664it [09:55, 1204.76it/s][batch_insert] retry:9, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, Broken pipe> 1000000it [15:16, 1090.80it/s] Upload time: 919.8683542869985 Total import time: 1126.062087302911 Experiment stage: Search (...)
Notice that after around 10minutes of ingestion zilliz cloud "breaks" and we need 8 and 9 retries to complete that batch insert. I've preserved the full log of all variations in case we need it for the future.
@wangting0128 notice that I've added a backoff strategy capacity to the tool to ensure we can properly handle this issues and benchmark with the correct conditions. I'll open a PR just for the zilliz cloud benchmarking still today.
Hi, sorry for replying to your message now.
Based on your problem description, I have some information to share with you~:
milvus-cloud
configuration, because I don't know whether you are running the same configuration on all datasets :>If you have any further questions, please feel free to contact us. Thank you very much~
It's recurrent to see the following type of errors on non-local setups:
Full traceback:
Given milvus configs dont specify the
batch_size
we're using 64 vectors, which seems to be constantly making the error state above. I suggest to either respect API Rate Limits With a Backoff or reduce the batch size.