Closed cydrain closed 1 month ago
compare the case runtime between arm and x86
some findings:
IndexType | arm 1st build time | arm 2nd build time | x86 1st build time | x86 2nd build time | multiple |
---|---|---|---|---|---|
IVF_FLAT | 0.083 | 0.002 | 4.083 | 0.012 | 400x |
IVF_FLAT_CC | 0.001 | 0.002 | 4.000 | 0.011 | 400x |
IVF_SQ8 | 0.001 | 0.002 | 4.305 | 0.014 | 400x |
IVF_PQ | 0.006 | 0.098 | 188.112 | 4.009 | 45x |
SCANN | 0.012 | 0.194 | 309.498 | 7.955 | 38x |
arm arm.log
2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = FLAT
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.000s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.083s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT_CC
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.001s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_SQ8
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.001s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_PQ
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.006s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = SCANN
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.012s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = HNSW
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.001s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = FLAT
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.000s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.002s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT_CC
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.002s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_SQ8
[2024-09-11 08:19:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.002s
[2024-09-11 08:19:39 - INFO - test_basic.py:106]: Start building index, index_type = IVF_PQ
[2024-09-11 08:19:40 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.098s
[2024-09-11 08:19:40 - INFO - test_basic.py:106]: Start building index, index_type = SCANN
[2024-09-11 08:19:40 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.194s
[2024-09-11 08:19:40 - INFO - test_basic.py:106]: Start building index, index_type = HNSW
[2024-09-11 08:19:40 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.001s
x86 x86.log
[2024-09-11 08:17:04 - INFO - test_basic.py:106]: Start building index, index_type = FLAT
[2024-09-11 08:17:04 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.000s
[2024-09-11 08:17:04 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT
[2024-09-11 08:17:08 - INFO - test_basic.py:98]: ### check_build_index runtime: 4.083s
[2024-09-11 08:17:08 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT_CC
[2024-09-11 08:17:12 - INFO - test_basic.py:98]: ### check_build_index runtime: 4.000s
[2024-09-11 08:17:13 - INFO - test_basic.py:106]: Start building index, index_type = IVF_SQ8
[2024-09-11 08:17:17 - INFO - test_basic.py:98]: ### check_build_index runtime: 4.305s
[2024-09-11 08:17:17 - INFO - test_basic.py:106]: Start building index, index_type = IVF_PQ
[2024-09-11 08:20:25 - INFO - test_basic.py:98]: ### check_build_index runtime: 188.112s
[2024-09-11 08:20:25 - INFO - test_basic.py:106]: Start building index, index_type = SCANN
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 309.498s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = HNSW
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.008s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = FLAT
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.000s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.012s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = IVF_FLAT_CC
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.011s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = IVF_SQ8
[2024-09-11 08:25:35 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.014s
[2024-09-11 08:25:35 - INFO - test_basic.py:106]: Start building index, index_type = IVF_PQ
[2024-09-11 08:25:39 - INFO - test_basic.py:98]: ### check_build_index runtime: 4.009s
[2024-09-11 08:25:39 - INFO - test_basic.py:106]: Start building index, index_type = SCANN
[2024-09-11 08:25:47 - INFO - test_basic.py:98]: ### check_build_index runtime: 7.955s
[2024-09-11 08:25:47 - INFO - test_basic.py:106]: Start building index, index_type = HNSW
[2024-09-11 08:25:47 - INFO - test_basic.py:98]: ### check_build_index runtime: 0.004s
do cProfile for arm and x86 for IVF_PQ (test_basic.py::TestBasic::test_float_index)
arm
Wed Sep 11 08:34:53 2024 test_float_index.profile
15 function calls in 0.098 seconds
Ordered by: cumulative time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.098 0.098 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_basic.py:263(test_float_index)
1 0.000 0.000 0.097 0.097 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_basic.py:107(wrapper)
1 0.097 0.097 0.097 0.097 {method 'enable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.11/dist-packages/knowhere/__init__.py:13(CreateIndex)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.11/dist-packages/knowhere/swigknowhere.py:630(__init__)
1 0.000 0.000 0.000 0.000 {built-in method knowhere._swigknowhere.new_IndexWrapFloat}
1 0.000 0.000 0.000 0.000 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/utils.py:64(gen_random_float_vec)
1 0.000 0.000 0.000 0.000 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}
1 0.000 0.000 0.000 0.000 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/config.py:7(get_default_build_config)
1 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}
x86
Wed Sep 11 08:43:44 2024 test_float_index.profile
15 function calls in 180.015 seconds
Ordered by: cumulative time
List reduced from 15 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 180.015 180.015 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_basic.py:263(test_float_index)
1 0.000 0.000 180.014 180.014 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_basic.py:107(wrapper)
1 180.014 180.014 180.014 180.014 {method 'enable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.11/dist-packages/knowhere/__init__.py:13(CreateIndex)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.11/dist-packages/knowhere/swigknowhere.py:630(__init__)
1 0.000 0.000 0.000 0.000 {built-in method knowhere._swigknowhere.new_IndexWrapFloat}
1 0.000 0.000 0.000 0.000 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/utils.py:64(gen_random_float_vec)
1 0.000 0.000 0.000 0.000 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}
1 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.11/dist-packages/knowhere/__init__.py:44(GetCurrentVersion)
cProfile for arm and x86 for IVF_PQ (test_func.py::TestFunc::test_float_index)
arm
Wed Sep 11 09:15:36 2024 test_float_index.profile
3216017 function calls (3216014 primitive calls) in 16.229 seconds
Ordered by: cumulative time
List reduced from 223 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 16.229 16.229 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:665(test_float_index)
13/10 0.001 0.000 16.228 1.623 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:37(wrapper)
2 0.001 0.001 10.770 5.385 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:219(check_search_and_recall)
51 0.000 0.000 9.590 0.188 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:181(check_search_result)
1 0.000 0.000 6.435 6.435 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:548(check_search_multi_thread)
51 2.135 0.042 6.426 0.126 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:139(check_search_distance_in_order)
1773900 4.290 0.000 4.290 0.000 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:140(check_order)
56 3.167 0.057 3.253 0.058 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:170(check_search_result_unique)
3 3.000 1.000 3.000 1.000 {built-in method time.sleep}
2 0.000 0.000 2.429 1.215 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:621(check_range_search_multi_thread)
1 0.000 0.000 1.717 1.717 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:61(check_build_index)
1 0.000 0.000 1.716 1.716 /usr/local/lib/python3.11/dist-packages/knowhere/swigknowhere.py:633(Build)
1 1.716 1.716 1.716 1.716 {built-in method knowhere._swigknowhere.IndexWrapFloat_Build}
48 0.000 0.000 1.022 0.021 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/utils.py:29(calc_recall)
48 0.198 0.004 1.022 0.021 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/utils.py:22(calc_hits)
5100 0.855 0.000 0.855 0.000 {method 'intersection' of 'set' objects}
150 0.308 0.002 0.308 0.002 {method 'acquire' of '_thread.lock' objects}
5 0.000 0.000 0.262 0.052 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:187(check_range_search_result)
3 0.000 0.000 0.245 0.082 /home/jenkins/agent/workspace/knowhere_Knowhere_e2e_arm_PR-831/tests/test_func.py:318(check_range_search_and_recall)
51 0.000 0.000 0.187 0.004 /usr/local/lib/python3.11/dist-packages/knowhere/swigknowhere.py:642(Search)
x86
Wed Sep 11 09:17:16 2024 test_float_index.profile
3216285 function calls (3216282 primitive calls) in 314.597 seconds
Ordered by: cumulative time
List reduced from 223 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 314.597 314.597 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:665(test_float_index)
13/10 0.003 0.000 314.589 31.459 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:37(wrapper)
1 0.000 0.000 300.091 300.091 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:61(check_build_index)
1 0.000 0.000 300.091 300.091 /usr/local/lib/python3.11/dist-packages/knowhere/swigknowhere.py:633(Build)
1 300.090 300.090 300.090 300.090 {built-in method knowhere._swigknowhere.IndexWrapFloat_Build}
2 0.002 0.001 10.290 5.145 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:219(check_search_and_recall)
51 0.001 0.000 9.138 0.179 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:181(check_search_result)
51 2.075 0.041 6.195 0.121 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:139(check_search_distance_in_order)
1 0.000 0.000 6.173 6.173 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:548(check_search_multi_thread)
1773900 4.119 0.000 4.119 0.000 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:140(check_order)
56 2.957 0.053 3.027 0.054 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:170(check_search_result_unique)
3 3.000 1.000 3.000 1.000 {built-in method time.sleep}
2 0.000 0.000 2.830 1.415 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:621(check_range_search_multi_thread)
48 0.000 0.000 1.013 0.021 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/utils.py:29(calc_recall)
48 0.233 0.005 1.013 0.021 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/utils.py:22(calc_hits)
5100 0.812 0.000 0.812 0.000 {method 'intersection' of 'set' objects}
150 0.668 0.004 0.668 0.004 {method 'acquire' of '_thread.lock' objects}
30 0.000 0.000 0.447 0.015 /usr/lib/python3.11/threading.py:1080(join)
30 0.000 0.000 0.446 0.015 /usr/lib/python3.11/threading.py:1118(_wait_for_tstate_lock)
5 0.000 0.000 0.290 0.058 /home/jenkins/agent/workspace/knowhere_kn2_PR-831/tests/test_func.py:187(check_range_search_result)
IVF_PQ
#0 faiss::(anonymous namespace)::exhaustive_L2sqr_seq_impl<faiss::HeapBlockResultHandler<faiss::CMax<float, long> >, faiss::(anonymous namespace)::IDSelectorAll>(const float * __restrict__, const float * __restrict__, size_t, size_t, size_t, faiss::HeapBlockResultHandler<faiss::CMax<float, long> > &, faiss::(anonymous namespace)::IDSelectorAll) (x=0x7fff7000b550, y=0x7fff70034140, d=32, nx=1000, ny=256, res=..., selector=...) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/utils/distances.cpp:317
#1 0x00007ffff5d29603 in faiss::(anonymous namespace)::exhaustive_L2sqr_seq<faiss::HeapBlockResultHandler<faiss::CMax<float, long> > >(const float * __restrict__, const float * __restrict__, size_t, size_t, size_t, faiss::HeapBlockResultHandler<faiss::CMax<float, long> > &, const faiss::IDSelector * __restrict__) (x=0x7fff7000b550, y=0x7fff70034140, d=32, nx=1000, ny=256, res=..., sel=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/utils/distances.cpp:388
#2 0x00007ffff5d26232 in faiss::(anonymous namespace)::knn_L2sqr_select<faiss::HeapBlockResultHandler<faiss::CMax<float, long> > > (x=0x7fff7000b550, y=0x7fff70034140, d=32, nx=1000, ny=256, res=..., y_norm2=0x0, sel=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/utils/distances.cpp:833
#3 0x00007ffff5d23e5a in faiss::knn_L2sqr (x=0x7fff7000b550, y=0x7fff70034140, d=32, nx=1000, ny=256, k=1, vals=0x7fff70003c60, ids=0x7fff70001d10, y_norm2=0x0, sel=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/utils/distances.cpp:997
#4 0x00007ffff5d2416e in faiss::knn_L2sqr (x=0x7fff7000b550, y=0x7fff70034140, d=32, nx=1000, ny=256, res=0x7fff897c01f0, y_norm2=0x0, sel=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/utils/distances.cpp:1021
#5 0x00007ffff5b49308 in faiss::IndexFlat::search (this=0x7fff897c05b0, n=1000, x=0x7fff7000b550, k=1, distances=0x7fff70003c60, labels=0x7fff70001d10, params=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/IndexFlat.cpp:66
#6 0x00007ffff5b05de5 in faiss::Clustering::train_encoded (this=0x7fff897c0530, nx=1000, x_in=0x7fff7000b550 "\226\370\067\302^\376\035B`\375\247?\024\214\033\301\244\266\v\302\030\f\004\302\212\064\365\301\240,;A\320rlADѣA\334-*\302\060\333s\301+\334\310\301\270\343\254A\340\025D\300\230\002\256\301@\225\330@O#< \026\372AXş\300p~", codec=0x0, index=..., weights=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/Clustering.cpp:591
#7 0x00007ffff5b03909 in faiss::Clustering::train (this=0x7fff897c0530, nx=1000, x_in=0x7fff7000b550, index=..., weights=0x0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/Clustering.cpp:69
#8 0x00007ffff5c3ca9e in faiss::ProductQuantizer::train (this=0x7fff70001600, n=1000, x=0x7fff90239010) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/impl/ProductQuantizer.cpp:183
#9 0x00007ffff5b803a2 in faiss::IndexIVFPQ::train_encoder (this=0x7fff70001490, n=1000, x=0x7fff90239010, assign=0x7fff70005ba0) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/IndexIVFPQ.cpp:69
#10 0x00007ffff5b5f34f in faiss::IndexIVF::train (this=0x7fff70001490, n=1000, x=0x7fff91cdb010) at /home/caiyd/work/vec/knowhere/thirdparty/faiss/faiss/IndexIVF.cpp:1213
#11 0x00007ffff56e51c5 in knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::TrainInternal (this=0x555555850e80, dataset=std::shared_ptr<knowhere::DataSet> (use count 5, weak count 1) = {...}, cfg=...) at /home/caiyd/work/vec/knowhere/src/index/ivf/ivf.cc:483
#12 0x00007ffff56cdc2e in knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}::operator()() const (this=0x555555850e80) at /home/caiyd/work/vec/knowhere/src/index/ivf/ivf.cc:378
#13 0x00007ffff56cdcbd in knowhere::ThreadPool::push<knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}>(knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}&&)::{lambda(auto:1&&)#1}::operator()<folly::Try<folly::Unit> >(knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}&&) (this=0x555555856470) at /home/caiyd/work/vec/knowhere/include/knowhere/comp/thread_pool.h:112
#14 0x00007ffff56ec22f in folly::Future<folly::Unit>::thenTry<knowhere::ThreadPool::push<knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}>(knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}&&)::{lambda(auto:1&&)#1}>(knowhere::IvfIndexNode<float, faiss::IndexIVFPQ>::Train(std::shared_ptr<knowhere::DataSet>, knowhere::Config const&)::{lambda()#1}&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<folly::Unit>&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<folly::Unit>&&) (this=0x555555856470, t=...) at /home/caiyd/.conan/data/folly/2023.10.30.08/milvus/dev/package/610a69487369414524768919b7ed62fe004e6557/include/folly/futures/Future-inl.h:941
knowhere CPU e2e run too slow
For example: knowhere e2e(arm) takes 27min https://jenkins-3.zilliz.cc/blue/organizations/jenkins/knowhere%2FKnowhere%20e2e(arm)/detail/PR-772/3/pipeline
knowhere e2e takes 54min https://jenkins.milvus.io:18080/blue/organizations/jenkins/knowhere%2Fkn2/detail/PR-772/4/pipeline