Closed rhdong closed 1 month ago
Hey @cjnolet @benfred, I implemented an initial version of faster_dot
(without deep optimization), which could help improve performance in many classic scenarios. I will continue to communicate with cuSparse Team. Here is the benchmark:
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark cuSparseSDDMM faster_dot Iterations Configuration(No filtered means normal `search`)
------------------------------------------------------------------------------------------------------n_sample#dim#n_query#top_k#remove rate# metric
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time 11.4 ms 11.4 ms 61 1000000#4096#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time 11.5 ms 11.4 ms 61 1000000#4096#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time 137 ms 7.02 ms 4 1000000#4096#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time 137 ms 7.03 ms 5 1000000#4096#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time 70.3 ms 4.72 ms 10 1000000#4096#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time 70.2 ms 4.72 ms 10 1000000#4096#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time 8.94 ms 2.05 ms 77 1000000#4096#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time 8.97 ms 2.06 ms 77 1000000#4096#1#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time 17.1 ms 17.1 ms 41 1000000#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time 17.2 ms 17.2 ms 41 1000000#4096#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time 149 ms 36.0 ms 5 1000000#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time 149 ms 36.1 ms 5 1000000#4096#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time 76.0 ms 19.2 ms 9 1000000#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time 76.2 ms 19.2 ms 9 1000000#4096#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time 9.43 ms 3.39 ms 72 1000000#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time 9.45 ms 3.40 ms 72 1000000#4096#10#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time 488 ms 489 ms 2 1000000#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time 495 ms 494 ms 2 1000000#4096#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time 1916 ms 3277 ms 1 1000000#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time 1915 ms 3280 ms 1 1000000#4096#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time 968 ms 1641 ms 1 1000000#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time 969 ms 1643 ms 1 1000000#4096#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time 108 ms 167 ms 6 1000000#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time 108 ms 167 ms 6 1000000#4096#1000#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time 2.64 ms 2.64 ms 265 1000000#512#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time 2.65 ms 2.65 ms 264 1000000#512#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time 21.4 ms 4.84 ms 32 1000000#512#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time 21.4 ms 4.84 ms 33 1000000#512#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time 12.0 ms 3.60 ms 57 1000000#512#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time 12.0 ms 3.62 ms 57 1000000#512#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time 2.83 ms 1.92 ms 246 1000000#512#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time 2.84 ms 1.94 ms 245 1000000#512#1#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time 3.57 ms 3.57 ms 196 1000000#512#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time 3.63 ms 3.63 ms 193 1000000#512#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time 22.3 ms 14.4 ms 31 1000000#512#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time 22.3 ms 14.4 ms 31 1000000#512#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time 12.6 ms 8.33 ms 55 1000000#512#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time 12.6 ms 8.35 ms 55 1000000#512#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time 2.80 ms 2.28 ms 249 1000000#512#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time 2.81 ms 2.29 ms 247 1000000#512#10#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time 75.0 ms 74.5 ms 9 1000000#512#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time 79.9 ms 79.7 ms 9 1000000#512#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time 210 ms 1085 ms 3 1000000#512#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time 215 ms 1088 ms 3 1000000#512#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time 105 ms 543 ms 7 1000000#512#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time 106 ms 544 ms 7 1000000#512#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time 12.6 ms 56.6 ms 55 1000000#512#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time 12.7 ms 56.8 ms 54 1000000#512#1000#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time 1.72 ms 1.71 ms 407 1000000#128#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time 1.73 ms 1.72 ms 406 1000000#128#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time 8.86 ms 3.99 ms 79 1000000#128#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time 8.87 ms 4.00 ms 79 1000000#128#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time 5.64 ms 3.19 ms 123 1000000#128#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time 5.62 ms 3.19 ms 120 1000000#128#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time 2.15 ms 1.87 ms 326 1000000#128#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time 2.16 ms 1.89 ms 323 1000000#128#1#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time 2.13 ms 2.12 ms 328 1000000#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time 2.19 ms 2.18 ms 320 1000000#128#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time 9.41 ms 6.07 ms 75 1000000#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time 9.44 ms 6.09 ms 74 1000000#128#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time 5.94 ms 4.16 ms 114 1000000#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time 5.97 ms 4.18 ms 113 1000000#128#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time 2.10 ms 1.86 ms 332 1000000#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time 2.11 ms 1.87 ms 331 1000000#128#10#255#0.99#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time 33.1 ms 33.1 ms 21 1000000#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time 38.0 ms 38.0 ms 18 1000000#128#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time 54.2 ms 244 ms 13 1000000#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time 57.4 ms 247 ms 12 1000000#128#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time 24.6 ms 121 ms 29 1000000#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time 26.1 ms 123 ms 27 1000000#128#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time 4.69 ms 14.4 ms 149 1000000#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time 4.83 ms 14.5 ms 138 1000000#128#1000#255#0.99#L2Expanded#NO_COPY#SEARCH
@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?
Num vectors: [100k, 1M, 10M] Dimensions: [128, 256, 768, 1024, 2048, 4096]
Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.
From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?
From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?
Hi @cjnolet , this is the benchmark for end-2-end, not only for the new kernel. So, maybe the chart is not so clear, the conclusion should be the performance decreasing with n_query
increasing. This new kernel is not deeply optimized. I'm not sure if the new kernel can cover all of the scenarios; we'd better determine the typical scenarios and use appropriate methods.
@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?
Num vectors: [100k, 1M, 10M] Dimensions: [128, 256, 768, 1024, 2048, 4096]
Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.
From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?
Hello @cjnolet , here is the latest benchmark, I improved the new kernel a little bit, it has more stable performance(without that much strange performance gap) and no need to consume extra buffer(that would might cause the OOM on some lower sparsity like cuSparseSDDMM). But its performance in some cases(roughly 30-40%) is not better than cuSparseSDDMM. I intend to merge them by a branch. Before that, I'd like to hear your advice, thank you! By the way, I think it would be a tight timeline to introduce the masked 1nn with a because the API inputs have a little difference that needs to be adapted. So, may I introduce it in the next release?
-------------------------------------------------------------------------------------------------------
Benchmark cuSparseSDDMM faster_dot Iterations
-------------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time 0.195 ms 0.193 ms 3584 10240#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time 0.227 ms 0.274 ms 3083 10240#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time 0.208 ms 0.239 ms 3364 10240#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time 0.184 ms 0.203 ms 3810 10240#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time 0.727 ms 0.724 ms 963 10240#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time 1.68 ms 0.635 ms 416 10240#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time 0.866 ms 0.396 ms 808 10240#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time 0.208 ms 0.152 ms 3365 10240#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time 0.265 ms 0.260 ms 2640 10240#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time 0.265 ms 0.352 ms 2637 10240#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time 0.242 ms 0.292 ms 2896 10240#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time 0.210 ms 0.244 ms 3325 10240#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time 0.915 ms 0.913 ms 764 10240#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time 2.96 ms 0.918 ms 237 10240#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time 1.49 ms 0.520 ms 469 10240#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time 0.289 ms 0.188 ms 2426 10240#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time 0.315 ms 0.313 ms 2218 10240#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time 0.323 ms 0.573 ms 2180 10240#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time 0.282 ms 0.422 ms 2475 10240#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time 0.227 ms 0.309 ms 3083 10240#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time 1.53 ms 1.53 ms 456 10240#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time 6.91 ms 1.53 ms 101 10240#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time 3.58 ms 0.944 ms 195 10240#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time 0.500 ms 0.262 ms 1398 10240#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time 0.329 ms 0.327 ms 2127 10240#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time 0.331 ms 0.664 ms 2114 10240#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time 0.286 ms 0.474 ms 2433 10240#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time 0.230 ms 0.332 ms 3037 10240#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time 1.82 ms 1.82 ms 385 10240#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time 7.98 ms 1.85 ms 88 10240#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time 4.15 ms 1.13 ms 169 10240#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time 0.563 ms 0.293 ms 1245 10240#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time 0.381 ms 0.382 ms 1840 10240#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time 0.375 ms 1.03 ms 1865 10240#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time 0.315 ms 0.683 ms 2224 10240#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time 0.235 ms 0.426 ms 2980 10240#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time 2.96 ms 2.97 ms 236 10240#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time 12.0 ms 3.07 ms 58 10240#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time 6.44 ms 1.95 ms 109 10240#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time 0.902 ms 0.505 ms 777 10240#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time 0.480 ms 0.482 ms 1455 10240#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time 0.471 ms 1.77 ms 1487 10240#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time 0.375 ms 1.08 ms 1868 10240#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time 0.247 ms 0.614 ms 2841 10240#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time 5.25 ms 5.27 ms 133 10240#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time 22.6 ms 5.51 ms 31 10240#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time 11.5 ms 3.93 ms 61 10240#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time 1.53 ms 0.925 ms 458 10240#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time 2.16 ms 2.16 ms 324 1048576#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time 5.68 ms 9.81 ms 123 1048576#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time 4.01 ms 6.22 ms 175 1048576#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time 1.92 ms 2.19 ms 365 1048576#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time 34.5 ms 34.5 ms 20 1048576#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time 241 ms 58.0 ms 3 1048576#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time 119 ms 26.3 ms 6 1048576#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time 14.2 ms 4.90 ms 49 1048576#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time 2.69 ms 2.70 ms 260 1048576#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time 7.30 ms 14.3 ms 96 1048576#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time 4.84 ms 8.52 ms 145 1048576#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time 2.00 ms 2.43 ms 349 1048576#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time 48.8 ms 48.9 ms 14 1048576#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time 421 ms 106 ms 2 1048576#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time 210 ms 51.4 ms 3 1048576#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time 23.4 ms 7.56 ms 30 1048576#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time 4.64 ms 4.64 ms 151 1048576#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time 11.8 ms 32.4 ms 59 1048576#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time 7.13 ms 17.9 ms 98 1048576#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time 2.24 ms 3.42 ms 312 1048576#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time 106 ms 106 ms 7 1048576#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time 738 ms 353 ms 1 1048576#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time 426 ms 178 ms 2 1048576#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time 46.5 ms 20.6 ms 15 1048576#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/72/0/0/manual_time 5.68 ms 5.67 ms 123 1048576#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/73/0/0/manual_time 13.4 ms 41.7 ms 52 1048576#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/74/0/0/manual_time 7.89 ms 22.6 ms 89 1048576#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/75/0/0/manual_time 2.31 ms 3.92 ms 304 1048576#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/76/0/0/manual_time 136 ms 135 ms 5 1048576#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/77/0/0/manual_time 996 ms 483 ms 1 1048576#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/78/0/0/manual_time 514 ms 239 ms 1 1048576#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/79/0/0/manual_time 54.0 ms 27.5 ms 13 1048576#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/80/0/0/manual_time 9.63 ms 9.62 ms 73 1048576#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/81/0/0/manual_time 18.4 ms 79.2 ms 38 1048576#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/82/0/0/manual_time 10.5 ms 41.5 ms 67 1048576#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/83/0/0/manual_time 2.57 ms 5.88 ms 272 1048576#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/84/0/0/manual_time 262 ms 262 ms 3 1048576#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/85/0/0/manual_time 1591 ms 986 ms 1 1048576#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/86/0/0/manual_time 793 ms 487 ms 1 1048576#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/87/0/0/manual_time 81.3 ms 55.6 ms 9 1048576#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/88/0/0/manual_time 17.7 ms 17.7 ms 40 1048576#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/89/0/0/manual_time 28.8 ms 156 ms 24 1048576#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/90/0/0/manual_time 15.7 ms 79.9 ms 44 1048576#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/91/0/0/manual_time 3.11 ms 9.80 ms 225 1048576#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/92/0/0/manual_time 509 ms 509 ms 1 1048576#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/93/0/0/manual_time 2687 ms 2018 ms 1 1048576#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/94/0/0/manual_time 1343 ms 1022 ms 1 1048576#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/95/0/0/manual_time 136 ms 114 ms 5 1048576#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/96/0/0/manual_time 19.7 ms 19.7 ms 36 10485760#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/97/0/0/manual_time 55.8 ms 97.2 ms 13 10485760#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/98/0/0/manual_time 38.9 ms 60.3 ms 18 10485760#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/99/0/0/manual_time 16.7 ms 19.0 ms 42 10485760#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/100/0/0/manual_time 342 ms 342 ms 2 10485760#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/101/0/0/manual_time 2473 ms 914 ms 1 10485760#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/102/0/0/manual_time 1237 ms 456 ms 1 10485760#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/103/0/0/manual_time 141 ms 61.8 ms 5 10485760#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/104/0/0/manual_time 24.9 ms 24.9 ms 28 10485760#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/105/0/0/manual_time 73.1 ms 143 ms 10 10485760#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/106/0/0/manual_time 47.5 ms 83.5 ms 15 10485760#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/107/0/0/manual_time 17.5 ms 25.1 ms 40 10485760#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH
The following cases are still running.
Hi @cjnolet @benfred , here is the latest benchmark for the latest commit by our discussion in Slack. Would you please help review it? Many thanks!
If density > 1.0%, use standard brute force (and do pre-filtering in the k-selection)
If n_queries <= 10, use a custom kernel
Otherwise, if n_queries > 10, use SDDMM
------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time 17.4 ms 739 ms 40 10000000#256#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time 17.0 ms 726 ms 41 10000000#256#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time 14.9 ms 724 ms 47 10000000#256#1#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time 23.5 ms 733 ms 30 10000000#256#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time 22.5 ms 746 ms 31 10000000#256#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time 16.7 ms 739 ms 42 10000000#256#10#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time 57.6 ms 783 ms 12 10000000#256#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time 61.2 ms 787 ms 11 10000000#256#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time 29.0 ms 756 ms 24 10000000#256#100#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time 463 ms 1189 ms 2 10000000#256#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time 513 ms 1231 ms 1 10000000#256#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time 91.5 ms 809 ms 8 10000000#256#1000#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time 2.01 ms 77.0 ms 348 1048576#256#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time 1.98 ms 77.2 ms 352 1048576#256#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time 1.96 ms 77.3 ms 357 1048576#256#1#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time 2.63 ms 77.6 ms 266 1048576#256#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time 2.54 ms 77.7 ms 275 1048576#256#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time 1.98 ms 76.9 ms 353 1048576#256#10#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time 6.21 ms 81.2 ms 113 1048576#256#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time 6.61 ms 82.1 ms 106 1048576#256#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time 2.93 ms 79.3 ms 238 1048576#256#100#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time 48.8 ms 126 ms 14 1048576#256#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time 54.1 ms 131 ms 13 1048576#256#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time 7.64 ms 84.2 ms 91 1048576#256#1000#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time 6.55 ms 614 ms 107 1048576#2048#1#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time 6.54 ms 607 ms 107 1048576#2048#1#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time 2.04 ms 610 ms 343 1048576#2048#1#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time 9.60 ms 613 ms 73 1048576#2048#10#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time 9.53 ms 618 ms 74 1048576#2048#10#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time 2.55 ms 603 ms 275 1048576#2048#10#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time 31.6 ms 632 ms 22 1048576#2048#100#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time 32.0 ms 634 ms 22 1048576#2048#100#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time 7.91 ms 616 ms 88 1048576#2048#100#256#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time 264 ms 876 ms 3 1048576#2048#1000#256#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time 269 ms 877 ms 3 1048576#2048#1000#256#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time 55.4 ms 664 ms 12 1048576#2048#1000#256#0.99#InnerProduct#NO_COPY#SEARCH
/merged
/merge