rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
683 stars 180 forks source link

[FEA] support of prefiltered brute force #2294

Closed rhdong closed 1 month ago

rhdong commented 2 months ago

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time       33.1 ms         69.9 ms           21 1000000#128#1000#255#0#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time       38.0 ms         74.8 ms           18 1000000#128#1000#255#0#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time       41.7 ms         78.5 ms           17 1000000#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time       57.5 ms         94.3 ms           12 1000000#128#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time       19.7 ms         56.4 ms           35 1000000#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time       26.1 ms         62.8 ms           27 1000000#128#1000#255#0.9#L2Expanded#NO_COPY#SEARCH```
rhdong commented 2 months ago

Hey @cjnolet @benfred, I implemented an initial version of faster_dot (without deep optimization), which could help improve performance in many classic scenarios. I will continue to communicate with cuSparse Team. Here is the benchmark:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark                                                    cuSparseSDDMM     faster_dot   Iterations Configuration(No filtered means normal `search`)
------------------------------------------------------------------------------------------------------n_sample#dim#n_query#top_k#remove rate# metric
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time        11.4 ms        11.4 ms           61 1000000#4096#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time        11.5 ms        11.4 ms           61 1000000#4096#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time         137 ms        7.02 ms            4 1000000#4096#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time         137 ms        7.03 ms            5 1000000#4096#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time        70.3 ms        4.72 ms           10 1000000#4096#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time        70.2 ms        4.72 ms           10 1000000#4096#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time        8.94 ms        2.05 ms           77 1000000#4096#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time        8.97 ms        2.06 ms           77 1000000#4096#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time        17.1 ms        17.1 ms           41 1000000#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time        17.2 ms        17.2 ms           41 1000000#4096#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time        149 ms        36.0 ms            5 1000000#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time        149 ms        36.1 ms            5 1000000#4096#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time       76.0 ms        19.2 ms            9 1000000#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time       76.2 ms        19.2 ms            9 1000000#4096#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time       9.43 ms        3.39 ms           72 1000000#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time       9.45 ms        3.40 ms           72 1000000#4096#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time        488 ms         489 ms            2 1000000#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time        495 ms         494 ms            2 1000000#4096#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time       1916 ms        3277 ms            1 1000000#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time       1915 ms        3280 ms            1 1000000#4096#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time        968 ms        1641 ms            1 1000000#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time        969 ms        1643 ms            1 1000000#4096#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time        108 ms         167 ms            6 1000000#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time        108 ms         167 ms            6 1000000#4096#1000#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time       2.64 ms        2.64 ms          265 1000000#512#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time       2.65 ms        2.65 ms          264 1000000#512#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time       21.4 ms        4.84 ms           32 1000000#512#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time       21.4 ms        4.84 ms           33 1000000#512#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time       12.0 ms        3.60 ms           57 1000000#512#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time       12.0 ms        3.62 ms           57 1000000#512#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time       2.83 ms        1.92 ms          246 1000000#512#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time       2.84 ms        1.94 ms          245 1000000#512#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time       3.57 ms        3.57 ms          196 1000000#512#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time       3.63 ms        3.63 ms          193 1000000#512#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time       22.3 ms        14.4 ms           31 1000000#512#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time       22.3 ms        14.4 ms           31 1000000#512#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time       12.6 ms        8.33 ms           55 1000000#512#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time       12.6 ms        8.35 ms           55 1000000#512#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time       2.80 ms        2.28 ms          249 1000000#512#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time       2.81 ms        2.29 ms          247 1000000#512#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time       75.0 ms        74.5 ms            9 1000000#512#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time       79.9 ms        79.7 ms            9 1000000#512#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time        210 ms        1085 ms            3 1000000#512#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time        215 ms        1088 ms            3 1000000#512#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time        105 ms         543 ms            7 1000000#512#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time        106 ms         544 ms            7 1000000#512#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time       12.6 ms        56.6 ms           55 1000000#512#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time       12.7 ms        56.8 ms           54 1000000#512#1000#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time       1.72 ms        1.71 ms          407 1000000#128#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time       1.73 ms        1.72 ms          406 1000000#128#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time       8.86 ms        3.99 ms           79 1000000#128#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time       8.87 ms        4.00 ms           79 1000000#128#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time       5.64 ms        3.19 ms          123 1000000#128#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time       5.62 ms        3.19 ms          120 1000000#128#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time       2.15 ms        1.87 ms          326 1000000#128#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time       2.16 ms        1.89 ms          323 1000000#128#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time       2.13 ms        2.12 ms          328 1000000#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time       2.19 ms        2.18 ms          320 1000000#128#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time       9.41 ms        6.07 ms           75 1000000#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time       9.44 ms        6.09 ms           74 1000000#128#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time       5.94 ms        4.16 ms          114 1000000#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time       5.97 ms        4.18 ms          113 1000000#128#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time       2.10 ms        1.86 ms          332 1000000#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time       2.11 ms        1.87 ms          331 1000000#128#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time       33.1 ms        33.1 ms           21 1000000#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time       38.0 ms        38.0 ms           18 1000000#128#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time       54.2 ms         244 ms           13 1000000#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time       57.4 ms         247 ms           12 1000000#128#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time       24.6 ms         121 ms           29 1000000#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time       26.1 ms         123 ms           27 1000000#128#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time       4.69 ms        14.4 ms          149 1000000#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time       4.83 ms        14.5 ms          138 1000000#128#1000#255#0.99#L2Expanded#NO_COPY#SEARCH
cjnolet commented 2 months ago

@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?

Num vectors: [100k, 1M, 10M] Dimensions: [128, 256, 768, 1024, 2048, 4096]

Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

rhdong commented 2 months ago

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

Hi @cjnolet , this is the benchmark for end-2-end, not only for the new kernel. So, maybe the chart is not so clear, the conclusion should be the performance decreasing with n_query increasing. This new kernel is not deeply optimized. I'm not sure if the new kernel can cover all of the scenarios; we'd better determine the typical scenarios and use appropriate methods.

rhdong commented 1 month ago

@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?

Num vectors: [100k, 1M, 10M] Dimensions: [128, 256, 768, 1024, 2048, 4096]

Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

Hello @cjnolet , here is the latest benchmark, I improved the new kernel a little bit, it has more stable performance(without that much strange performance gap) and no need to consume extra buffer(that would might cause the OOM on some lower sparsity like cuSparseSDDMM). But its performance in some cases(roughly 30-40%) is not better than cuSparseSDDMM. I intend to merge them by a branch. Before that, I'd like to hear your advice, thank you! By the way, I think it would be a tight timeline to introduce the masked 1nn with a because the API inputs have a little difference that needs to be adapted. So, may I introduce it in the next release?

-------------------------------------------------------------------------------------------------------
Benchmark                                                    cuSparseSDDMM     faster_dot         Iterations
-------------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time        0.195 ms      0.193 ms         3584 10240#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time        0.227 ms      0.274 ms         3083 10240#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time        0.208 ms      0.239 ms         3364 10240#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time        0.184 ms      0.203 ms         3810 10240#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time        0.727 ms      0.724 ms          963 10240#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time         1.68 ms      0.635 ms          416 10240#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time        0.866 ms      0.396 ms          808 10240#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time        0.208 ms      0.152 ms         3365 10240#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time        0.265 ms      0.260 ms         2640 10240#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time        0.265 ms      0.352 ms         2637 10240#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time       0.242 ms      0.292 ms         2896 10240#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time       0.210 ms      0.244 ms         3325 10240#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time       0.915 ms      0.913 ms          764 10240#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time        2.96 ms      0.918 ms          237 10240#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time        1.49 ms      0.520 ms          469 10240#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time       0.289 ms      0.188 ms         2426 10240#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time       0.315 ms      0.313 ms         2218 10240#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time       0.323 ms      0.573 ms         2180 10240#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time       0.282 ms      0.422 ms         2475 10240#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time       0.227 ms      0.309 ms         3083 10240#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time        1.53 ms       1.53 ms          456 10240#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time        6.91 ms       1.53 ms          101 10240#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time        3.58 ms      0.944 ms          195 10240#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time       0.500 ms      0.262 ms         1398 10240#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time       0.329 ms      0.327 ms         2127 10240#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time       0.331 ms      0.664 ms         2114 10240#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time       0.286 ms      0.474 ms         2433 10240#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time       0.230 ms      0.332 ms         3037 10240#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time        1.82 ms       1.82 ms          385 10240#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time        7.98 ms       1.85 ms           88 10240#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time        4.15 ms       1.13 ms          169 10240#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time       0.563 ms      0.293 ms         1245 10240#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time       0.381 ms      0.382 ms         1840 10240#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time       0.375 ms       1.03 ms         1865 10240#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time       0.315 ms      0.683 ms         2224 10240#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time       0.235 ms      0.426 ms         2980 10240#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time        2.96 ms       2.97 ms          236 10240#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time        12.0 ms       3.07 ms           58 10240#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time        6.44 ms       1.95 ms          109 10240#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time       0.902 ms      0.505 ms          777 10240#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time       0.480 ms      0.482 ms         1455 10240#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time       0.471 ms       1.77 ms         1487 10240#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time       0.375 ms       1.08 ms         1868 10240#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time       0.247 ms      0.614 ms         2841 10240#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time        5.25 ms       5.27 ms          133 10240#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time        22.6 ms       5.51 ms           31 10240#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time        11.5 ms       3.93 ms           61 10240#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time        1.53 ms      0.925 ms          458 10240#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time        2.16 ms       2.16 ms          324 1048576#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time        5.68 ms       9.81 ms          123 1048576#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time        4.01 ms       6.22 ms          175 1048576#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time        1.92 ms       2.19 ms          365 1048576#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time        34.5 ms       34.5 ms           20 1048576#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time         241 ms       58.0 ms            3 1048576#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time         119 ms       26.3 ms            6 1048576#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time        14.2 ms       4.90 ms           49 1048576#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time        2.69 ms       2.70 ms          260 1048576#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time        7.30 ms       14.3 ms           96 1048576#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time        4.84 ms       8.52 ms          145 1048576#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time        2.00 ms       2.43 ms          349 1048576#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time        48.8 ms       48.9 ms           14 1048576#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time         421 ms        106 ms            2 1048576#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time         210 ms       51.4 ms            3 1048576#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time        23.4 ms       7.56 ms           30 1048576#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time        4.64 ms       4.64 ms          151 1048576#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time        11.8 ms       32.4 ms           59 1048576#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time        7.13 ms       17.9 ms           98 1048576#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time        2.24 ms       3.42 ms          312 1048576#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time         106 ms        106 ms            7 1048576#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time         738 ms        353 ms            1 1048576#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time         426 ms        178 ms            2 1048576#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time        46.5 ms       20.6 ms           15 1048576#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/72/0/0/manual_time        5.68 ms       5.67 ms          123 1048576#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/73/0/0/manual_time        13.4 ms       41.7 ms           52 1048576#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/74/0/0/manual_time        7.89 ms       22.6 ms           89 1048576#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/75/0/0/manual_time        2.31 ms       3.92 ms          304 1048576#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/76/0/0/manual_time         136 ms        135 ms            5 1048576#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/77/0/0/manual_time         996 ms        483 ms            1 1048576#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/78/0/0/manual_time         514 ms        239 ms            1 1048576#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/79/0/0/manual_time        54.0 ms       27.5 ms           13 1048576#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/80/0/0/manual_time        9.63 ms       9.62 ms           73 1048576#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/81/0/0/manual_time        18.4 ms       79.2 ms           38 1048576#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/82/0/0/manual_time        10.5 ms       41.5 ms           67 1048576#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/83/0/0/manual_time        2.57 ms       5.88 ms          272 1048576#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/84/0/0/manual_time         262 ms        262 ms            3 1048576#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/85/0/0/manual_time        1591 ms        986 ms            1 1048576#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/86/0/0/manual_time         793 ms        487 ms            1 1048576#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/87/0/0/manual_time        81.3 ms       55.6 ms            9 1048576#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/88/0/0/manual_time        17.7 ms       17.7 ms           40 1048576#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/89/0/0/manual_time        28.8 ms        156 ms           24 1048576#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/90/0/0/manual_time        15.7 ms       79.9 ms           44 1048576#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/91/0/0/manual_time        3.11 ms       9.80 ms          225 1048576#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/92/0/0/manual_time         509 ms        509 ms            1 1048576#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/93/0/0/manual_time        2687 ms       2018 ms            1 1048576#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/94/0/0/manual_time        1343 ms       1022 ms            1 1048576#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/95/0/0/manual_time         136 ms        114 ms            5 1048576#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/96/0/0/manual_time        19.7 ms       19.7 ms           36 10485760#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/97/0/0/manual_time        55.8 ms       97.2 ms           13 10485760#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/98/0/0/manual_time        38.9 ms       60.3 ms           18 10485760#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/99/0/0/manual_time        16.7 ms       19.0 ms           42 10485760#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/100/0/0/manual_time        342 ms        342 ms            2 10485760#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/101/0/0/manual_time       2473 ms        914 ms            1 10485760#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/102/0/0/manual_time       1237 ms        456 ms            1 10485760#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/103/0/0/manual_time        141 ms       61.8 ms            5 10485760#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/104/0/0/manual_time       24.9 ms       24.9 ms           28 10485760#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/105/0/0/manual_time       73.1 ms        143 ms           10 10485760#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/106/0/0/manual_time       47.5 ms       83.5 ms           15 10485760#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/107/0/0/manual_time       17.5 ms       25.1 ms           40 10485760#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

The following cases are still running.

rhdong commented 1 month ago

Hi @cjnolet @benfred , here is the latest benchmark for the latest commit by our discussion in Slack. Would you please help review it? Many thanks!

If density > 1.0%, use standard brute force (and do pre-filtering in the k-selection)
If n_queries <= 10, use a custom kernel
Otherwise, if n_queries > 10, use SDDMM
------------------------------------------------------------------------------------------------------
Benchmark                                                            Time             CPU   Iterations
------------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time        17.4 ms          739 ms           40 10000000#256#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time        17.0 ms          726 ms           41 10000000#256#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time        14.9 ms          724 ms           47 10000000#256#1#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time        23.5 ms          733 ms           30 10000000#256#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time        22.5 ms          746 ms           31 10000000#256#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time        16.7 ms          739 ms           42 10000000#256#10#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time        57.6 ms          783 ms           12 10000000#256#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time        61.2 ms          787 ms           11 10000000#256#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time        29.0 ms          756 ms           24 10000000#256#100#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time         463 ms         1189 ms            2 10000000#256#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time        513 ms         1231 ms            1 10000000#256#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time       91.5 ms          809 ms            8 10000000#256#1000#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time       2.01 ms         77.0 ms          348 1048576#256#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time       1.98 ms         77.2 ms          352 1048576#256#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time       1.96 ms         77.3 ms          357 1048576#256#1#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time       2.63 ms         77.6 ms          266 1048576#256#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time       2.54 ms         77.7 ms          275 1048576#256#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time       1.98 ms         76.9 ms          353 1048576#256#10#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time       6.21 ms         81.2 ms          113 1048576#256#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time       6.61 ms         82.1 ms          106 1048576#256#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time       2.93 ms         79.3 ms          238 1048576#256#100#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time       48.8 ms          126 ms           14 1048576#256#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time       54.1 ms          131 ms           13 1048576#256#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time       7.64 ms         84.2 ms           91 1048576#256#1000#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time       6.55 ms          614 ms          107 1048576#2048#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time       6.54 ms          607 ms          107 1048576#2048#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time       2.04 ms          610 ms          343 1048576#2048#1#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time       9.60 ms          613 ms           73 1048576#2048#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time       9.53 ms          618 ms           74 1048576#2048#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time       2.55 ms          603 ms          275 1048576#2048#10#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time       31.6 ms          632 ms           22 1048576#2048#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time       32.0 ms          634 ms           22 1048576#2048#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time       7.91 ms          616 ms           88 1048576#2048#100#256#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time        264 ms          876 ms            3 1048576#2048#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time        269 ms          877 ms            3 1048576#2048#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time       55.4 ms          664 ms           12 1048576#2048#1000#256#0.99#InnerProduct#NO_COPY#SEARCH
cjnolet commented 1 month ago

/merged

benfred commented 1 month ago

/merge